Adapt the PDFtk PDF tool's call syntax with Go

Reading and Writing

The ioutil package also includes the convenience functions ReadFile() and WriteFile(). They read or write a snippet of text, which must exist as a byte array slice (and not as a string), from or to a file.

To do this, line 18 in Listing 4 first uses Join() to concatenate the command and all parameters, separated by space characters, thus creating a long string. The []byte() cast operator then converts the string to a byte array slice.

Conversely, ReadFile() in line 34 reads the modified file into memory. In line 39, the program converts the resulting bytes in the line variable into a character string, using string() to do so. It also chops off the line break that vi appends to the end of the file without being asked.

The Split() function in line 40 separates the program and its arguments into a new array and assigns it back to the input array, by dereferencing the passed-in pointer. The calling function will then access the modified data instead of the original.

The pdftkArgs() function from Listing 5 builds the PDFtk command-line syntax introduced at the beginning of this article; it's preparing the more elaborate parameter set to be used for more complicated cases. It assigns an uppercase letter to each input file and then lists the pages to be joined with A1-end, B1-end, etc. To do this, it iterates over all input files in line 10 and increments the index idx by one, starting at  . With this, it increases the ASCII value determined in line 8 by int('A') and thus obtains B, C, and so on.

Listing 5

args.go

01 package main
02
03 import "fmt"
04
05 func pdftkArgs(files []string) []string {
06   args := []string{"pdftk"}
07   catArgs := []string{}
08   letterChr := int('A')
09
10   for idx, file := range files {
11     letter := string(letterChr + idx)
12     args = append(args,
13       fmt.Sprintf("%s=%s", letter, file))
14     catArgs = append(catArgs,
15       fmt.Sprintf("%s1-end", letter))
16   }
17
18   args = append(args, "cat")
19   args = append(args, catArgs...)
20   args = append(args,
21     "output", outfile(files))
22   return args
23 }

Greatest Common Denominator

That leaves us with the task of determining the name of the output file from all input files using the greatest common denominator. For this purpose, Listing 6 removes the file extension in outfile() using the Ext() function from the path/filepath package. The extension stripped should be .pdf.

Listing 6

outfile.go

01 package main
02
03 import (
04   "fmt"
05   "path/filepath"
06   "strings"
07 )
08
09 func outfile(infiles []string) string {
10   if len(infiles) == 0 {
11     panic("Cannot have zero infiles")
12   }
13
14   ext := filepath.Ext(infiles[0])
15   base := longestSubstr(infiles)
16   base = strings.TrimSuffix(base, ext)
17   base = strings.TrimSuffix(base, "-")
18
19   return fmt.Sprintf(
20     "%s-out%s", base, ext)
21 }
22
23 func longestSubstr(all []string) string {
24   testIdx := 0
25   keepGoing := true
26
27   for keepGoing {
28     var c byte
29
30     for _, instring := range all {
31       if testIdx >= len(instring) {
32         keepGoing = false
33         break
34       }
35
36       if c == 0 { // uninitialized?
37         c = instring[testIdx]
38         continue
39       }
40
41       if instring[testIdx] != c {
42         keepGoing = false
43         break
44       }
45
46     }
47     testIdx++
48   }
49
50   if testIdx <= 1 {
51     return ""
52   }
53   return all[0][0 : testIdx-1]
54 }

In longestSubstr() from line 23, it then looks, starting at the beginning of the string, for the longest string common to all file names before removing any hyphens in line 17. The call in line 19 adds -out to the base name determined in this way, as well as the .pdf suffix, which was removed at the beginning. This concludes generating the name of the target file from the source files.

To determine the longest common substring from the beginning, the longestSubstr() function starting in line 23 implements a small finite state machine. Line 30 iterates over the names of all input files and stores the currently investigated letter position in the first file name from the list in the variable c. The use of zero values eases the control flow here; these are fixed values that Go assigns to variables that have not yet been initialized.

The byte type variable c is still not initialized after its declaration in line 28. Therefore, according to the Go manual, it has an integer value of  . Line 36 uses this to check whether the variable c has already been set to the currently investigated letter of the first file name in the inner for loop starting in line 30. If not, line 37 goes ahead and primes the variable. After this, continue will be ringing in the next round.

During the following passes of the inner loop, line 41 checks whether the currently investigated letter in the next source file from the list still matches the letter from the first file name (and thus the value stored in c). When the first mismatch occurs, line 42 sets the variable keepGoing to false, and line 43 breaks out of the inner loop (break). This also causes the outer loop to terminate. The testIdx counter is incremented by one in each round to move to the next character, matching as many as possible.

At the end of this, the value of testIdx has overshot by one, because the file names start to differ at this index position. Consequently, longestSubstr() in line 53 returns an array slice whose highest element index has been reduced by one while running the state machine.

Other Worlds

The build command from line 1 of Listing 7 creates the pdftki executable for the current platform from the four subprograms. The subsequent call to pdftki *.pdf runs the binary with the source files and initiates the desired action. The program does not need any additional modules – just what's available in Go's standard library.

Listing 7

Compiling Pdftki

01 $ go build pdftki.go edit.go args.go outfile.go
02 $ GOOS=linux GOARCH=i386 go build pdftki.go edit.go args.go outfile.go

If you want to use the binary under Linux but develop the program on a Mac, you can compile it there with the command from line 2 of Listing 7 and then simply copy the resulting executable to a Linux computer, where it runs without any problems. The opposite route also works, if necessary.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • PDF Toolkit

    To manage the mountains of paper that cross our desks every day, we need to file, retrieve, copy, stamp, investigate, and classify documents. A special tool can help users keep on top of their electronic paperwork: pdftk – the PDF toolkit.

  • Perl – Tagging e-Books in Evernote

    Google Drive lacks a mechanism for tagging files, so we look at two APIs that scripts can use to store metadata on Evernote, allowing searches of e-books by category or property.

  • Command Line: make

    Developers, LaTeX users, and system administrators can all harness the power of make.

  • Programming Snapshot – Bulk Renaming

    Renaming multiple files following a pattern often requires small shell scripts. Mike Schilli looks to simplify this task with a Go program.

  • Sweet Dreams

    Bathtub singer Mike Schilli builds a Go tool that manages song lyrics from YAML files and helps him learn them by heart, line by line.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News