Editing PDFs with OpenOffice.org

Repackaged

© mipan, Fotolia

© mipan, Fotolia

Author(s):

OpenOffice has always been able to export PDF documents. Version 3.0 is the first to introduce an extension that lets you import and edit PDFs.

Adobe originally designed the Portable Document Format (PDF) as a display-only format for platform-independent document viewing, but several tools for editing PDF documents have appeared over the years. OpenOffice.org version 3.0 (OOo) brings the power of PDF editing to the free OpenOffice suite.

Before you raise your hopes too high, let me start by saying that, despite the best efforts of editing tools, PDF remains a format primarily dedicated to rendering documents and has limited editing capability. If you are thinking of using an OpenOffice extension to edit the document in its original format, including templates, paragraphs, images, and tables, you will be disappointed. This limitation is not a failing of OpenOffice; it is inherent in the PDF format itself. If you need to send documents in an editable format, the Open Document Format (ODF, i.e., OpenOffice's native file format) is a better choice.

Despite this, the PDF Import function is a useful feature that lets users open PDF document content for editing, with some limitations. The extension adds a function for loading PDF files as easily as any other format to the free office suite.

Installation

Although PDF Import [1] was released at the same time as OpenOffice 3.0, it is not included in the default installation. To load the extension, you need to download it from the Extension Repository [2] (see the "Extension Repository" box). The version number still has a "Beta" tag to show that the plugin is still under development and might be buggy (Figure 1).

Figure 1: The PDF Import extension is downloaded easily with a couple of mouse clicks.

Clicking the Get it! button for your choice of platform and then selecting Open in your browser will launch OpenOffice and offer to install the extension (Figure 2). Alternatively, you can download the file separately and install it by double-clicking. After accepting the license agreement, the extension manager will list the new function.

Figure 2: The PDF Import extension is ready for installation.

Extension Repository

The Extension Repository [2] has been an important site for all users of OpenOffice since version 3.0 at the latest. Now, literally hundreds of downloadable extensions are available, and new extensions are added every month. Most are free of charge, although a couple of commercial extensions are not.

Popular add-ons include the Presenter Console, the Wiki Publisher, WollMux by the City of Munich, and Writer2LaTex. Many features are available only as extensions to make them easier to maintain than if they were included in the main body of integrated OpenOffice program code. At the same time, extensions prevent overloading of the program code with features that many users do not actually need.

How It Works

After installing the extension, OpenOffice will not look different from before, although the import feature is listed by the Extension Manager. No new menu entries and no additional icon bars are visible. However, you do not need them because you can load PDF files normally, with File | Open…. After a short delay, the document opens – surprisingly in Draw, the OpenOffice drawing module. But why? At first sight, it might seem strange, but if you think about a PDF file's characteristics, the reason quickly becomes apparent. Because the individual file elements are defined in a proprietary page description language, it is impossible to tell whether the PDF contains text, a presentation, or a table.

If you open a simple document containing body text in a standard font, like that shown in Figures 3 and 4, you should have no trouble importing the file. But when you edit the document, you will notice that the layout is not maintained. Each line of text is in a frame that is not linked to the other frames (lines of text). This means that you can only edit line by line, making it difficult to format paragraphs or change the line spacing. Again, this limitation is inherent in the PDF page description language.

Figure 3: The PDF document in a PDF reader …
Figure 4: … and after importing into OpenOffice.org Draw.

A similar problem occurs when you import tables. Again, you can edit a table line by line, but a workaround is needed for any major changes because the individual rows and columns are isolated rather than forming a contiguous table.

Because of the restrictions inherent in a PDF, much of the additional information is lost, such as the document structure, the outline, and the templates. Although OpenOffice converts formatting correctly, it can only support direct formatting because the PDF does not provide the templates or define their dependencies. This does not change if a PDF is exported as a tagged PDF, or PDF/A-1. In contrast, images and drawings are inserted correctly. Although OpenOffice will not automatically group drawing elements, you can easily correct this with the standard tools in Draw.

The PDF format supports page orientations, such as portrait and landscape. OpenOffice Draw does not support this feature at present; instead, it uses the orientation of the first page for all following pages. Also, it is impossible to reimport the macros in OOo documents that have been converted to PDFs: Although recent versions of the PDF format support script execution in PDF reader, this does not apply to OOo Basic and other Office scripting languages.

Fonts are another obstacle. Although OpenOffice – as well as many other programs – will, by default, export a subset of the fonts in a document to a PDF, the results are only suitable for viewing on third-party systems, not for editing. Exporting fonts is also a difficult topic for licensing reasons. OpenOffice's approach to this is to select a substitute font when importing PDF documents with fonts that are not installed on the local system – in some cases, the substitute font is not a good choice.

The import function cannot handle protected and encrypted PDF documents. Whether the document is edit protected or just view protected, the PDF Importer will fail to import (Figure 5). Of course, cracking tools will let you remove the password protection, but you have to consider the legality of this action in the case of third-party PDFs.

Figure 5: OpenOffice can't import encrypted PDF documents.

PDF Import Benefits

Although OpenOffice will fail to import a number of objects – typically because of weaknesses inherent in the PDF format itself – the PDF Import extension is more than just a toy. Even expensive commercial tools are limited in their abilities to edit PDF documents successfully because the format is just not designed for this purpose. In addition, some of the problems you run into can be resolved with the use of standard OpenOffice tools.

The font import issue is not as big a problem as it might seem in production use because most documents use standard fonts that are available either on any system or in the form of a matching substitute in OpenOffice. Often PDFs are scans, and thus image files that you can import.

Finally, don't forget that the OpenOffice PDF Import feature is still a beta version, which means that it could still contain some bugs. Some functions might not yet be implemented. The developers are already working on plans to improve import performance in future versions.

PDF Import makes sense in many scenarios. If you have lost an original document, for example, you will be glad of any chance to reconstruct the content. If you need to reference external sources, you can use the PDF Import feature to do so. Also, if you need to fill out or save a PDF form, even though the document does not support this, you will again be glad of the OpenOffice extension. In production use, the extension is also perfect for minor changes to documents – say, correcting typos or adding a watermark.

Copying via the Clipboard

After importing a PDF into OpenOffice, editing body text is a very painstaking task. If you want to copy text from an unprotected PDF into a document of your own, the alternative is to use the clipboard.

First, select the text with Ctrl+A, press Ctrl+C to copy it from the PDF, then press Ctrl+V to insert it into your document. In the best possible case, much of the formatting, such as bold and italic types, will be kept. In most cases, the results will be contiguous paragraphs that you can format and align.

Practical Hybrid Format

Although the name doesn't suggest it, the PDF Import extension also includes an export function that creates a really practical format. This unspectacular feature, which is tagged onto the normal PDF export dialog in OpenOffice (Figure 6), lets you create PDFs in a hybrid format. The document has a .pdf suffix and can be read in any normal PDF reader. In addition, it contains the original file in its native Open Document Format.

Figure 6: Create a hybrid file when you export PDFs.

This allows OpenOffice or StarOffice users to open the original document for editing in the PDF Import extension. Instead of Draw, it opens the module used to create the file (e.g., Writer, Calc, or Impress). The hybrid document thus combines the benefits of both formats: The recipient can edit the file normally and, just to be on the safe side, is given a "proof copy" in PDF format, with fonts and graphics that show what the original author meant the document to look like.

Conclusions

PDF Import for OpenOffice.org demonstrates its potential despite its fairly early development stage. Already it is useful for minor corrections to PDFs, and the developers are working on improving the extension. It will be interesting to see the changes in the next release.

Despite all this, you should always remember that PDF is a display format that does not lend itself to editing. If you need to exchange editable documents, it makes far more sense to use a format like ODF or to create hybrid PDFs that give you the best of both worlds.

The Author

Florian Effenberger has been a free software evangelist for many years. He is the Co-Lead of OpenOffice.org's international marketing project and a member of the board of OpenOffice.org Deutschland e.V., a German NGO. His work mainly focuses on designing enterprise and school networks and software distribution solutions based on free software. Florian is a regular contributor to various German and English language publications, in which he investigates legal issues, among other topics.