Adapt the PDFtk PDF tool's call syntax with Go
Reading and Writing
The ioutil package also includes the convenience functions ReadFile()
and WriteFile()
. They read or write a snippet of text, which must exist as a byte
array slice (and not as a string), from or to a file.
To do this, line 18 in Listing 4 first uses Join()
to concatenate the command and all parameters, separated by space characters, thus creating a long string. The []byte()
cast operator then converts the string to a byte
array slice.
Conversely, ReadFile()
in line 34 reads the modified file into memory. In line 39, the program converts the resulting bytes in the line
variable into a character string, using string()
to do so. It also chops off the line break that vi appends to the end of the file without being asked.
The Split()
function in line 40 separates the program and its arguments into a new array and assigns it back to the input array, by dereferencing the passed-in pointer. The calling function will then access the modified data instead of the original.
The pdftkArgs()
function from Listing 5 builds the PDFtk command-line syntax introduced at the beginning of this article; it's preparing the more elaborate parameter set to be used for more complicated cases. It assigns an uppercase letter to each input file and then lists the pages to be joined with A1-end
, B1-end
, etc. To do this, it iterates over all input files in line 10 and increments the index idx
by one, starting at
. With this, it increases the ASCII value determined in line 8 by int('A')
and thus obtains B, C, and so on.
Listing 5
args.go
01 package main 02 03 import "fmt" 04 05 func pdftkArgs(files []string) []string { 06 args := []string{"pdftk"} 07 catArgs := []string{} 08 letterChr := int('A') 09 10 for idx, file := range files { 11 letter := string(letterChr + idx) 12 args = append(args, 13 fmt.Sprintf("%s=%s", letter, file)) 14 catArgs = append(catArgs, 15 fmt.Sprintf("%s1-end", letter)) 16 } 17 18 args = append(args, "cat") 19 args = append(args, catArgs...) 20 args = append(args, 21 "output", outfile(files)) 22 return args 23 }
Greatest Common Denominator
That leaves us with the task of determining the name of the output file from all input files using the greatest common denominator. For this purpose, Listing 6 removes the file extension in outfile()
using the Ext()
function from the path/filepath package. The extension stripped should be .pdf
.
Listing 6
outfile.go
01 package main 02 03 import ( 04 "fmt" 05 "path/filepath" 06 "strings" 07 ) 08 09 func outfile(infiles []string) string { 10 if len(infiles) == 0 { 11 panic("Cannot have zero infiles") 12 } 13 14 ext := filepath.Ext(infiles[0]) 15 base := longestSubstr(infiles) 16 base = strings.TrimSuffix(base, ext) 17 base = strings.TrimSuffix(base, "-") 18 19 return fmt.Sprintf( 20 "%s-out%s", base, ext) 21 } 22 23 func longestSubstr(all []string) string { 24 testIdx := 0 25 keepGoing := true 26 27 for keepGoing { 28 var c byte 29 30 for _, instring := range all { 31 if testIdx >= len(instring) { 32 keepGoing = false 33 break 34 } 35 36 if c == 0 { // uninitialized? 37 c = instring[testIdx] 38 continue 39 } 40 41 if instring[testIdx] != c { 42 keepGoing = false 43 break 44 } 45 46 } 47 testIdx++ 48 } 49 50 if testIdx <= 1 { 51 return "" 52 } 53 return all[0][0 : testIdx-1] 54 }
In longestSubstr()
from line 23, it then looks, starting at the beginning of the string, for the longest string common to all file names before removing any hyphens in line 17. The call in line 19 adds -out
to the base name determined in this way, as well as the .pdf
suffix, which was removed at the beginning. This concludes generating the name of the target file from the source files.
To determine the longest common substring from the beginning, the longestSubstr()
function starting in line 23 implements a small finite state machine. Line 30 iterates over the names of all input files and stores the currently investigated letter position in the first file name from the list in the variable c
. The use of zero values eases the control flow here; these are fixed values that Go assigns to variables that have not yet been initialized.
The byte
type variable c
is still not initialized after its declaration in line 28. Therefore, according to the Go manual, it has an integer value of
. Line 36 uses this to check whether the variable c
has already been set to the currently investigated letter of the first file name in the inner for
loop starting in line 30. If not, line 37 goes ahead and primes the variable. After this, continue
will be ringing in the next round.
During the following passes of the inner loop, line 41 checks whether the currently investigated letter in the next source file from the list still matches the letter from the first file name (and thus the value stored in c
). When the first mismatch occurs, line 42 sets the variable keepGoing
to false
, and line 43 breaks out of the inner loop (break
). This also causes the outer loop to terminate. The testIdx
counter is incremented by one in each round to move to the next character, matching as many as possible.
At the end of this, the value of testIdx
has overshot by one, because the file names start to differ at this index position. Consequently, longestSubstr()
in line 53 returns an array slice whose highest element index has been reduced by one while running the state machine.
Other Worlds
The build
command from line 1 of Listing 7 creates the pdftki
executable for the current platform from the four subprograms. The subsequent call to pdftki *.pdf
runs the binary with the source files and initiates the desired action. The program does not need any additional modules – just what's available in Go's standard library.
Listing 7
Compiling Pdftki
01 $ go build pdftki.go edit.go args.go outfile.go 02 $ GOOS=linux GOARCH=i386 go build pdftki.go edit.go args.go outfile.go
If you want to use the binary under Linux but develop the program on a Mac, you can compile it there with the command from line 2 of Listing 7 and then simply copy the resulting executable to a Linux computer, where it runs without any problems. The opposite route also works, if necessary.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
The GNU Project Celebrates Its 40th Birthday
September 27 marks the 40th anniversary of the GNU Project, and it was celebrated with a hacker meeting in Biel/Bienne, Switzerland.
-
Linux Kernel Reducing Long-Term Support
LTS support for the Linux kernel is about to undergo some serious changes that will have a considerable impact on the future.
-
Fedora 39 Beta Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.
-
Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.
-
Real-Time Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for real-time processing could do a whole lot worse than Real-Time Ubuntu.
-
KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.
-
Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.
-
Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.
-
Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.
-
UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.