Editing PDFs with OpenOffice.org
OpenOffice has always been able to export PDF documents. Version 3.0 is the first to introduce an extension that lets you import and edit PDFs.
Adobe originally designed the Portable Document Format (PDF) as a display-only format for platform-independent document viewing, but several tools for editing PDF documents have appeared over the years. OpenOffice.org version 3.0 (OOo) brings the power of PDF editing to the free OpenOffice suite.
Before you raise your hopes too high, let me start by saying that, despite the best efforts of editing tools, PDF remains a format primarily dedicated to rendering documents and has limited editing capability. If you are thinking of using an OpenOffice extension to edit the document in its original format, including templates, paragraphs, images, and tables, you will be disappointed. This limitation is not a failing of OpenOffice; it is inherent in the PDF format itself. If you need to send documents in an editable format, the Open Document Format (ODF, i.e., OpenOffice's native file format) is a better choice.
Despite this, the PDF Import function is a useful feature that lets users open PDF document content for editing, with some limitations. The extension adds a function for loading PDF files as easily as any other format to the free office suite.
Although PDF Import  was released at the same time as OpenOffice 3.0, it is not included in the default installation. To load the extension, you need to download it from the Extension Repository  (see the "Extension Repository" box). The version number still has a "Beta" tag to show that the plugin is still under development and might be buggy (Figure 1).
Clicking the Get it! button for your choice of platform and then selecting Open in your browser will launch OpenOffice and offer to install the extension (Figure 2). Alternatively, you can download the file separately and install it by double-clicking. After accepting the license agreement, the extension manager will list the new function.
The Extension Repository  has been an important site for all users of OpenOffice since version 3.0 at the latest. Now, literally hundreds of downloadable extensions are available, and new extensions are added every month. Most are free of charge, although a couple of commercial extensions are not.
Popular add-ons include the Presenter Console, the Wiki Publisher, WollMux by the City of Munich, and Writer2LaTex. Many features are available only as extensions to make them easier to maintain than if they were included in the main body of integrated OpenOffice program code. At the same time, extensions prevent overloading of the program code with features that many users do not actually need.
How It Works
After installing the extension, OpenOffice will not look different from before, although the import feature is listed by the Extension Manager. No new menu entries and no additional icon bars are visible. However, you do not need them because you can load PDF files normally, with File | Open…. After a short delay, the document opens – surprisingly in Draw, the OpenOffice drawing module. But why? At first sight, it might seem strange, but if you think about a PDF file's characteristics, the reason quickly becomes apparent. Because the individual file elements are defined in a proprietary page description language, it is impossible to tell whether the PDF contains text, a presentation, or a table.
If you open a simple document containing body text in a standard font, like that shown in Figures 3 and 4, you should have no trouble importing the file. But when you edit the document, you will notice that the layout is not maintained. Each line of text is in a frame that is not linked to the other frames (lines of text). This means that you can only edit line by line, making it difficult to format paragraphs or change the line spacing. Again, this limitation is inherent in the PDF page description language.
A similar problem occurs when you import tables. Again, you can edit a table line by line, but a workaround is needed for any major changes because the individual rows and columns are isolated rather than forming a contiguous table.
Because of the restrictions inherent in a PDF, much of the additional information is lost, such as the document structure, the outline, and the templates. Although OpenOffice converts formatting correctly, it can only support direct formatting because the PDF does not provide the templates or define their dependencies. This does not change if a PDF is exported as a tagged PDF, or PDF/A-1. In contrast, images and drawings are inserted correctly. Although OpenOffice will not automatically group drawing elements, you can easily correct this with the standard tools in Draw.
The PDF format supports page orientations, such as portrait and landscape. OpenOffice Draw does not support this feature at present; instead, it uses the orientation of the first page for all following pages. Also, it is impossible to reimport the macros in OOo documents that have been converted to PDFs: Although recent versions of the PDF format support script execution in PDF reader, this does not apply to OOo Basic and other Office scripting languages.
Fonts are another obstacle. Although OpenOffice – as well as many other programs – will, by default, export a subset of the fonts in a document to a PDF, the results are only suitable for viewing on third-party systems, not for editing. Exporting fonts is also a difficult topic for licensing reasons. OpenOffice's approach to this is to select a substitute font when importing PDF documents with fonts that are not installed on the local system – in some cases, the substitute font is not a good choice.
The import function cannot handle protected and encrypted PDF documents. Whether the document is edit protected or just view protected, the PDF Importer will fail to import (Figure 5). Of course, cracking tools will let you remove the password protection, but you have to consider the legality of this action in the case of third-party PDFs.
PDF Import Benefits
Although OpenOffice will fail to import a number of objects – typically because of weaknesses inherent in the PDF format itself – the PDF Import extension is more than just a toy. Even expensive commercial tools are limited in their abilities to edit PDF documents successfully because the format is just not designed for this purpose. In addition, some of the problems you run into can be resolved with the use of standard OpenOffice tools.
The font import issue is not as big a problem as it might seem in production use because most documents use standard fonts that are available either on any system or in the form of a matching substitute in OpenOffice. Often PDFs are scans, and thus image files that you can import.
Finally, don't forget that the OpenOffice PDF Import feature is still a beta version, which means that it could still contain some bugs. Some functions might not yet be implemented. The developers are already working on plans to improve import performance in future versions.
PDF Import makes sense in many scenarios. If you have lost an original document, for example, you will be glad of any chance to reconstruct the content. If you need to reference external sources, you can use the PDF Import feature to do so. Also, if you need to fill out or save a PDF form, even though the document does not support this, you will again be glad of the OpenOffice extension. In production use, the extension is also perfect for minor changes to documents – say, correcting typos or adding a watermark.
Copying via the Clipboard
After importing a PDF into OpenOffice, editing body text is a very painstaking task. If you want to copy text from an unprotected PDF into a document of your own, the alternative is to use the clipboard.
First, select the text with Ctrl+A, press Ctrl+C to copy it from the PDF, then press Ctrl+V to insert it into your document. In the best possible case, much of the formatting, such as bold and italic types, will be kept. In most cases, the results will be contiguous paragraphs that you can format and align.
MSBuild is now just another GitHub project as Redmond continues its path to the light.
Malware could pass data and commands between disconnected computers without leaving a trace on the network.
New rules emphasize collegiality in coding.
Upstart lands in the dust bin as a new era begins for Linux.
HP's annual Cyber Risk report offers a bleak look at the state of IT.
But what do the big numbers really mean?
.NET Core execution engine is the basis for cross-platform .NET implementations.
The Xnote trojan hides itself on the target system and will launch a variety of attacks on command.
Spammers go low-volume, and 90% of IE browsers are unpatched.
Adobe scrambles to release patches for vulnerable Flash Player.