Editing PDFs with OpenOffice.org
Repackaged
OpenOffice has always been able to export PDF documents. Version 3.0 is the first to introduce an extension that lets you import and edit PDFs.
Adobe originally designed the Portable Document Format (PDF) as a display-only format for platform-independent document viewing, but several tools for editing PDF documents have appeared over the years. OpenOffice.org version 3.0 (OOo) brings the power of PDF editing to the free OpenOffice suite.
Before you raise your hopes too high, let me start by saying that, despite the best efforts of editing tools, PDF remains a format primarily dedicated to rendering documents and has limited editing capability. If you are thinking of using an OpenOffice extension to edit the document in its original format, including templates, paragraphs, images, and tables, you will be disappointed. This limitation is not a failing of OpenOffice; it is inherent in the PDF format itself. If you need to send documents in an editable format, the Open Document Format (ODF, i.e., OpenOffice's native file format) is a better choice.
Despite this, the PDF Import function is a useful feature that lets users open PDF document content for editing, with some limitations. The extension adds a function for loading PDF files as easily as any other format to the free office suite.
Installation
Although PDF Import [1] was released at the same time as OpenOffice 3.0, it is not included in the default installation. To load the extension, you need to download it from the Extension Repository [2] (see the "Extension Repository" box). The version number still has a "Beta" tag to show that the plugin is still under development and might be buggy (Figure 1).
Clicking the Get it! button for your choice of platform and then selecting Open in your browser will launch OpenOffice and offer to install the extension (Figure 2). Alternatively, you can download the file separately and install it by double-clicking. After accepting the license agreement, the extension manager will list the new function.
Extension Repository
The Extension Repository [2] has been an important site for all users of OpenOffice since version 3.0 at the latest. Now, literally hundreds of downloadable extensions are available, and new extensions are added every month. Most are free of charge, although a couple of commercial extensions are not.
Popular add-ons include the Presenter Console, the Wiki Publisher, WollMux by the City of Munich, and Writer2LaTex. Many features are available only as extensions to make them easier to maintain than if they were included in the main body of integrated OpenOffice program code. At the same time, extensions prevent overloading of the program code with features that many users do not actually need.
How It Works
After installing the extension, OpenOffice will not look different from before, although the import feature is listed by the Extension Manager. No new menu entries and no additional icon bars are visible. However, you do not need them because you can load PDF files normally, with File | Open…. After a short delay, the document opens – surprisingly in Draw, the OpenOffice drawing module. But why? At first sight, it might seem strange, but if you think about a PDF file's characteristics, the reason quickly becomes apparent. Because the individual file elements are defined in a proprietary page description language, it is impossible to tell whether the PDF contains text, a presentation, or a table.
If you open a simple document containing body text in a standard font, like that shown in Figures 3 and 4, you should have no trouble importing the file. But when you edit the document, you will notice that the layout is not maintained. Each line of text is in a frame that is not linked to the other frames (lines of text). This means that you can only edit line by line, making it difficult to format paragraphs or change the line spacing. Again, this limitation is inherent in the PDF page description language.
A similar problem occurs when you import tables. Again, you can edit a table line by line, but a workaround is needed for any major changes because the individual rows and columns are isolated rather than forming a contiguous table.
Because of the restrictions inherent in a PDF, much of the additional information is lost, such as the document structure, the outline, and the templates. Although OpenOffice converts formatting correctly, it can only support direct formatting because the PDF does not provide the templates or define their dependencies. This does not change if a PDF is exported as a tagged PDF, or PDF/A-1. In contrast, images and drawings are inserted correctly. Although OpenOffice will not automatically group drawing elements, you can easily correct this with the standard tools in Draw.
The PDF format supports page orientations, such as portrait and landscape. OpenOffice Draw does not support this feature at present; instead, it uses the orientation of the first page for all following pages. Also, it is impossible to reimport the macros in OOo documents that have been converted to PDFs: Although recent versions of the PDF format support script execution in PDF reader, this does not apply to OOo Basic and other Office scripting languages.
Fonts are another obstacle. Although OpenOffice – as well as many other programs – will, by default, export a subset of the fonts in a document to a PDF, the results are only suitable for viewing on third-party systems, not for editing. Exporting fonts is also a difficult topic for licensing reasons. OpenOffice's approach to this is to select a substitute font when importing PDF documents with fonts that are not installed on the local system – in some cases, the substitute font is not a good choice.
The import function cannot handle protected and encrypted PDF documents. Whether the document is edit protected or just view protected, the PDF Importer will fail to import (Figure 5). Of course, cracking tools will let you remove the password protection, but you have to consider the legality of this action in the case of third-party PDFs.
PDF Import Benefits
Although OpenOffice will fail to import a number of objects – typically because of weaknesses inherent in the PDF format itself – the PDF Import extension is more than just a toy. Even expensive commercial tools are limited in their abilities to edit PDF documents successfully because the format is just not designed for this purpose. In addition, some of the problems you run into can be resolved with the use of standard OpenOffice tools.
The font import issue is not as big a problem as it might seem in production use because most documents use standard fonts that are available either on any system or in the form of a matching substitute in OpenOffice. Often PDFs are scans, and thus image files that you can import.
Finally, don't forget that the OpenOffice PDF Import feature is still a beta version, which means that it could still contain some bugs. Some functions might not yet be implemented. The developers are already working on plans to improve import performance in future versions.
PDF Import makes sense in many scenarios. If you have lost an original document, for example, you will be glad of any chance to reconstruct the content. If you need to reference external sources, you can use the PDF Import feature to do so. Also, if you need to fill out or save a PDF form, even though the document does not support this, you will again be glad of the OpenOffice extension. In production use, the extension is also perfect for minor changes to documents – say, correcting typos or adding a watermark.
Copying via the Clipboard
After importing a PDF into OpenOffice, editing body text is a very painstaking task. If you want to copy text from an unprotected PDF into a document of your own, the alternative is to use the clipboard.
First, select the text with Ctrl+A, press Ctrl+C to copy it from the PDF, then press Ctrl+V to insert it into your document. In the best possible case, much of the formatting, such as bold and italic types, will be kept. In most cases, the results will be contiguous paragraphs that you can format and align.
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.