Editing PDF Structure with QPDF

Command Line – QPDF

© Lead Image © Joe Belanger, 123RF.com

© Lead Image © Joe Belanger, 123RF.com

Article from Issue 226/2019
Author(s):

Use QPDF to easily make structural changes to your PDFs, including reorganizing pages, creating watermarks, setting encryption options, and changing permissions.

QPDF [1] is a structural editor for PDF files. This description places it in a very specific niche. In its usual output method, it does not edit the content of PDF files – to the extent that editing content is possible, opening a PDF in LibreOffice is generally the easiest way to work. Nor does QPDF import PDFs to different formats – the repositories of major distributions like Debian are full of scripts for that, like pdf2htmlEX and pdf2svg. However, if you need to change how a PDF is put together, QPDF is a toolkit that is both comprehensive and more convenient than the assorted scripts that only perform a single function. In fact, by adding options, you can make an entire series of structural edits with a single command. QPDF is especially handy if you no longer have the file from which a PDF was generated and are therefore unable to make a new one with different settings.

QPDF is available in many distributions. If it is not in your distribution's repository, you can download the source code from the project site and build it with the usual trio of commands: configure, make, and make install. The syntax, too, is simple:

qpdf OPTIONS ORIGINAL-FILE OUTPUT-FILE

The output file is not needed for some options, such as those for information. Commands complete without any confirmation except the return to the prompt.

The original file is kept untouched when the command is run, so any errors will not leave you with a corrupted file. Detailed help is available from the command qpdf --help rather than the man page. QPDF's options are numerous, but the most generally used options can be divided into four main categories: those for general operations, information, page selection, and encryption. In addition, for the adventurous, QPDF can create a file in QDF mode, which will create an output file that can be opened in a text editor.

Options for General Operations

These options determine how QPDF runs, and most can be used alongside other options. They include several unusual features. For example, QPDF's own completion tool can be enabled for either the Bash or Zsh shells with the command

eval $(qpdf --completion-bash)

or

eval $(qpdf --completion-zsh)

If QPDF is not in your path, you will need to give its complete path in order to use completion.

Similarly, if a PDF is protected by a password, in order for QPDF to manipulate it, you will need to give the password with the option --password=PASSWORD. Without the password, even the information options will not function. If, as often happens, the original PDF has two passwords, one for viewing and one for editing, you will have to enter options for both passwords unless they are the same.

If you want a PDF that displays quickly on the web, select --linearization. This display makes such changes as reducing the resolution of images so that they load faster. Documents that are all text will benefit minimally from linearization.

Information Options

QPDF's options for retrieving information about a PDF are useful for troubleshooting or for using the QPDF library in automated test suites (Figure 1). The --check option gives a quick summary of the file structure, encryption, and linearization. The function of other information options is evident from their names (Table 1).

Table 1

Information Options

--show-encryption

Quickly show encryption settings

--check-linearization

Check file integrity and linearization status

--show-linearization

Check and show all linearization data

--show-xref

Show the contents of the cross-reference table

--show-npages

Print the n number of pages in the file

--show-pages

Gives info for each page

--with-images

Shows the object/generation number for each page plus object IDs for images on each page

--check

Check file structure plus encryption and linearization

Figure 1: Information options like --show-encryption display detailed information about a file's structure.

Page Selection

QPDF can manipulate the pages shown in the output file. The --pages option must be used after the basic command, and the page range after the original file. Individual page numbers can be separated by commas, or a range of pages by a dash. Individual pages and ranges can be listed together, so that 3,5,11-14 would be a valid listing of pages. Pages are printed in the order that they are listed, so 11-14,3,5 prints pages 11-14 first in the output file. Other values include z-1 to print in reverse order starting from the last page, and r2-r1 prints the last two pages, while r1-r2 prints the last two pages in reverse order.

Output files can also be created that use multiple source PDFs. When using multiple files, place the --pages=PAGES option after the name of each source file, rather than after the basic command. After the basic command, you can add --collate so that the output file begins with the first page or range for the first file in the command, followed by the first page or range for the second file, then the second page or range for the first file, and so on. For example:

qpdf file first.pdf pages=1-4 second.pdf pages=r4-r1 merged.pdf

Still another way to select pages is to define a particular page as either an overlay or an underlay, in effect creating a watermark. Whether you use an overlay or underlay is a matter of choice, usually determined by what you want to be displayed clearly. --overlay or --underlay is added after the basic command, and the first page specified for the first file becomes the overlay or underlay for all the pages specified in the second file. Alternatively, where the overlay or underlay file is applied can be specified by adding --to=PAGES and --from=PAGES after it.

Encryption Options

Contrary to some passing references on the web, QPDF's main purpose is not to crack password protected PDFs. It may enable cracking with the use of --password-is-hex-key, which interprets the password as a hexadecimal-encoded key value. However, the lack of a viewer to support this mode means that the option is only possibly useful, allowing the output file to be viewed with forensic tools – although the manual is careful not to specify which tools.

However, if you have the password for a PDF, you can edit its encryption options. If you have the password, the encryption key can be viewed with --show-encryption-key. You can also remove all encryption with the option --decrypt.

In addition, you can edit a PDF's built-in permissions. The necessary snippet of the command structure is:

--encrypt USER-PASSWORD OWNER-PASSWORD KEY-LENGTH PERMISSIONS

USER-PASSWORD and OWNER-PASSWORD refer to the passwords added when the PDF is created. And, despite its name, KEY-LENGTH does not refer to the public key used in an application like GPG, but to groups of settings that are part of the PDF standard. These groups are designated by lengths of 40, 128, and 256. Each group has its own settings, as shown in Table 2.

Table 2

PDF Permission Settings

Key Length = 40

--print=[yn]

Allows printing

--extract=[yn]

Allows text or image extraction

--annotate=[yn]

Allows comments and form fill-in and signing

Key Length = 128

--accessibility=[yn]

Allows accessibility to visually impaired

--extract= [yn]

Allows text or image extraction

--assemble=[yn]

Allows rotation and reordering of pages

--annotate=[yn]

Allows comments, form fill-in, and signing

--form=[yn]

Whether filling form fields is allowed

--modify-other=[yn]

Allows all document editing except those controlled separately by --assemble, --annotate, and --form

--print=print-opt[full, low, none]

Controls printing resolution or whether it is allowed

--modify=[all, annotate, form, assembly, none]

Controls modify access

Key Length = 256

--use-aes=[yn]

Uses AES encryption instead of RC4 encryption

The lengths of 40 and 128 give the same permissions as are available using CommonPDF file creators. Be aware that the built-in encryption is notoriously weak and can be bypassed by a number of applications that are available for the download. If you are seriously concerned about security that goes beyond providing an obstacle for unsophisticated users, be sure to include a key length of 256, which provides more serious encryption. My recommendation is to use it alongside the 128 key length, which provides comprehensive options. If no key length is specified, the output file is fully editable.

QDF Mode

Generally, the easiest way to edit a PDF file is to open it in LibreOfice Writer. Writer is especially ideal if you are using a hybrid PDF – that is, one created in Writer that also includes a copy of the file in OpenDocument Format, LibreOffice's default format. At the cost of a file twice as large as an ordinary PDF, a hybrid provides a fully editable file that also updates the accompanying PDF file when saved. But if you do not have a hybrid file, then a PDF can only be edited line by line in Writer and other editors, and new lines are only practical in blank space.

QDF mode is a format that displays like any other PDF, but it can be edited in a regular text editor, as long as there is no password protection. If a file does have a password, it can be viewed, but not edited. The catch is that the format displays all objects in numerical order. This format takes some practice to read. Content is easy to find, but objects like images need to be carefully edited – for instance, if you remove an image, you need to update every other image, or else the output file will not build or display properly (Figure 2).

Figure 2: QDF mode allows you to view and edit the structure of a PDF in a text file.

To create a file in QDF mode, simply add the --qdf option. If you run into trouble with a QDF mode file, try using --fix-qdf. This option tries to repair everything from object streams to cross-reference tables, although the repairs may not be entirely what you hoped. Also, be aware that QDF mode is incompatible with linearization, which essentially gives the same view of the file.

Other Options

This article only covers the uses of QPDF that might be useful to end users. The QPDF manual [2] is current and contains almost as much information again for developers. As well as options for testing and debugging, QPDF has options for how it handles Unicode passwords and file names and for use in C++, C, JavaScript, and Python.

However, you do not need to be a developer to find QPDF useful. Although you will probably want to work with the latest version of the manual open, QPDF is a comprehensive toolkit and can replace several common scripts under one command. If you regularly edit PDFs, QPDF is in many ways an essential application.

The Author

Bruce Byfield is a computer journalist and a freelance writer and editor specializing in free and open source software. In addition to his writing projects, he also teaches live and e-learning courses. In his spare time, Bruce writes about Northwest coast art (http://brucebyfield.wordpress.com). He is also co-founder of Prentice Pieces, a blog about writing and fantasy at https://prenticepieces.com/.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News