Editing PDFs with OpenOffice.org

Repackaged

© mipan, Fotolia

© mipan, Fotolia

Article from Issue 104/2009
Author(s):

OpenOffice has always been able to export PDF documents. Version 3.0 is the first to introduce an extension that lets you import and edit PDFs.

Adobe originally designed the Portable Document Format (PDF) as a display-only format for platform-independent document viewing, but several tools for editing PDF documents have appeared over the years. OpenOffice.org version 3.0 (OOo) brings the power of PDF editing to the free OpenOffice suite.

Before you raise your hopes too high, let me start by saying that, despite the best efforts of editing tools, PDF remains a format primarily dedicated to rendering documents and has limited editing capability. If you are thinking of using an OpenOffice extension to edit the document in its original format, including templates, paragraphs, images, and tables, you will be disappointed. This limitation is not a failing of OpenOffice; it is inherent in the PDF format itself. If you need to send documents in an editable format, the Open Document Format (ODF, i.e., OpenOffice's native file format) is a better choice.

Despite this, the PDF Import function is a useful feature that lets users open PDF document content for editing, with some limitations. The extension adds a function for loading PDF files as easily as any other format to the free office suite.

Installation

Although PDF Import [1] was released at the same time as OpenOffice 3.0, it is not included in the default installation. To load the extension, you need to download it from the Extension Repository [2] (see the "Extension Repository" box). The version number still has a "Beta" tag to show that the plugin is still under development and might be buggy (Figure 1).

Figure 1: The PDF Import extension is downloaded easily with a couple of mouse clicks.

Clicking the Get it! button for your choice of platform and then selecting Open in your browser will launch OpenOffice and offer to install the extension (Figure 2). Alternatively, you can download the file separately and install it by double-clicking. After accepting the license agreement, the extension manager will list the new function.

Figure 2: The PDF Import extension is ready for installation.

Extension Repository

The Extension Repository [2] has been an important site for all users of OpenOffice since version 3.0 at the latest. Now, literally hundreds of downloadable extensions are available, and new extensions are added every month. Most are free of charge, although a couple of commercial extensions are not.

Popular add-ons include the Presenter Console, the Wiki Publisher, WollMux by the City of Munich, and Writer2LaTex. Many features are available only as extensions to make them easier to maintain than if they were included in the main body of integrated OpenOffice program code. At the same time, extensions prevent overloading of the program code with features that many users do not actually need.

How It Works

After installing the extension, OpenOffice will not look different from before, although the import feature is listed by the Extension Manager. No new menu entries and no additional icon bars are visible. However, you do not need them because you can load PDF files normally, with File | Open…. After a short delay, the document opens – surprisingly in Draw, the OpenOffice drawing module. But why? At first sight, it might seem strange, but if you think about a PDF file's characteristics, the reason quickly becomes apparent. Because the individual file elements are defined in a proprietary page description language, it is impossible to tell whether the PDF contains text, a presentation, or a table.

If you open a simple document containing body text in a standard font, like that shown in Figures 3 and 4, you should have no trouble importing the file. But when you edit the document, you will notice that the layout is not maintained. Each line of text is in a frame that is not linked to the other frames (lines of text). This means that you can only edit line by line, making it difficult to format paragraphs or change the line spacing. Again, this limitation is inherent in the PDF page description language.

Figure 3: The PDF document in a PDF reader …
Figure 4: … and after importing into OpenOffice.org Draw.

A similar problem occurs when you import tables. Again, you can edit a table line by line, but a workaround is needed for any major changes because the individual rows and columns are isolated rather than forming a contiguous table.

Because of the restrictions inherent in a PDF, much of the additional information is lost, such as the document structure, the outline, and the templates. Although OpenOffice converts formatting correctly, it can only support direct formatting because the PDF does not provide the templates or define their dependencies. This does not change if a PDF is exported as a tagged PDF, or PDF/A-1. In contrast, images and drawings are inserted correctly. Although OpenOffice will not automatically group drawing elements, you can easily correct this with the standard tools in Draw.

The PDF format supports page orientations, such as portrait and landscape. OpenOffice Draw does not support this feature at present; instead, it uses the orientation of the first page for all following pages. Also, it is impossible to reimport the macros in OOo documents that have been converted to PDFs: Although recent versions of the PDF format support script execution in PDF reader, this does not apply to OOo Basic and other Office scripting languages.

Fonts are another obstacle. Although OpenOffice – as well as many other programs – will, by default, export a subset of the fonts in a document to a PDF, the results are only suitable for viewing on third-party systems, not for editing. Exporting fonts is also a difficult topic for licensing reasons. OpenOffice's approach to this is to select a substitute font when importing PDF documents with fonts that are not installed on the local system – in some cases, the substitute font is not a good choice.

The import function cannot handle protected and encrypted PDF documents. Whether the document is edit protected or just view protected, the PDF Importer will fail to import (Figure 5). Of course, cracking tools will let you remove the password protection, but you have to consider the legality of this action in the case of third-party PDFs.

Figure 5: OpenOffice can't import encrypted PDF documents.

PDF Import Benefits

Although OpenOffice will fail to import a number of objects – typically because of weaknesses inherent in the PDF format itself – the PDF Import extension is more than just a toy. Even expensive commercial tools are limited in their abilities to edit PDF documents successfully because the format is just not designed for this purpose. In addition, some of the problems you run into can be resolved with the use of standard OpenOffice tools.

The font import issue is not as big a problem as it might seem in production use because most documents use standard fonts that are available either on any system or in the form of a matching substitute in OpenOffice. Often PDFs are scans, and thus image files that you can import.

Finally, don't forget that the OpenOffice PDF Import feature is still a beta version, which means that it could still contain some bugs. Some functions might not yet be implemented. The developers are already working on plans to improve import performance in future versions.

PDF Import makes sense in many scenarios. If you have lost an original document, for example, you will be glad of any chance to reconstruct the content. If you need to reference external sources, you can use the PDF Import feature to do so. Also, if you need to fill out or save a PDF form, even though the document does not support this, you will again be glad of the OpenOffice extension. In production use, the extension is also perfect for minor changes to documents – say, correcting typos or adding a watermark.

Copying via the Clipboard

After importing a PDF into OpenOffice, editing body text is a very painstaking task. If you want to copy text from an unprotected PDF into a document of your own, the alternative is to use the clipboard.

First, select the text with Ctrl+A, press Ctrl+C to copy it from the PDF, then press Ctrl+V to insert it into your document. In the best possible case, much of the formatting, such as bold and italic types, will be kept. In most cases, the results will be contiguous paragraphs that you can format and align.

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Office Suites

    In the office, the interoperability and cooperation of a few programs play an important role. We take the four big Linux office suites to task and see how well they cope with non-native formats.

  • The Clear Choice

    While LibreOffice and OpenOffice have a shared past, LibreOffice outstrips OpenOffice in contributors, code commits, and features.

  • OpenOffice 2.0 Preview

    At first glance,OpenOffice 2.0 might scare current users with a GUI that closely resembles Microsoft’s competitor product. But on closer inspection, the beast turns out to be a beauty.

  • writer2ePub

    The writer2ePub plugin for OpenOffice Writer converts formatted files to compatible data for many of the popular e-book readers at the press of a button.

  • ExtendedPDF

    The PDF format has many useful features that make it easier for readers to find their way around large documents, but the native PDF export function in OpenOffice doesn’t allow many of these elements. We’ll show you how to create better PDFs in OpenOffice with the ExtendedPDF macro.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News