Document management for the small office
Installation
Paperwork has no built-in installation routine and requires a large number of additional packages. Depending on the Linux distribution, this will sometimes mean expending quite a bit of effort before Paperwork is ready to go. Although the developers provide brief installation instructions for several Linux derivatives on GitHub, they are, however, partly obsolete and therefore not necessarily meaningful. On a freshly installed Ubuntu 16.04, I talked the software into running in a process that involved several steps (see the "Installing Paperwork" box).
Installing Paperwork
First, you need to install various Python packages on the system:
sudo apt install python3-pip python3-setuptools python3-dev python3-pil libenchant-dev
Then, retrieve Paperwork and store it on your mass storage medium using the Pip package manager for Python modules:
sudo pip3 install paperwork
Next, use the Paperwork shell to check whether the program can satisfy all its dependencies. The following commands will help you with this:
paperwork-shell chkdeps paperwork_backend paperwork-shell chkdeps paperwork
Meet any dependencies still unresolved and, depending on your desktop environment, create a starter for Paperwork in the menu system.
If you are missing a starter for the software at the end of the installation, you can launch the program from a terminal by typing paperwork
. This quickly loads a two-panel window. Scanned documents and documents loaded from your data repository appear on the right, whereas any document-specific attributes, such tags, will appear on the left. First, you will want to call the settings dialog with your scanner switched on, even if this only offers you a few basic options (Figure 8), particularly with respect to the resolution of the scanner.
If your pool of original documents includes many documents with very small fonts, it is advisable to increase the default value to 300dpi. Under certain circumstances, users might want to change the language selection for Tesseract OCR in the selection box.
Usability
Paperwork allows for highly flexible use of documents: When scanning, it will grab a complete stack of originals if you have an automatic document feeder. To do this, press the Scan button in the upper right-hand corner of the program window. Paperwork then shows each page in an animation in the window's right pane. In the background, Tesseract is already performing optical character recognition – if you enabled this previously.
The left pane displays thumbnail images of the first page. The papers
subdirectory, which the software creates in the user's home directory, is used to store the text files; multiple-page documents are stored in a single subdirectory. The text files have a .words
suffix and can be processed as raw files in any standard editor. But they are primarily used for indexing with keywords.
Paperwork also first runs the OCR program when importing existing files. Note that the software does not support any formats from third-party applications, such as office suites. The exception is the widespread and universal PDF format. If you integrate magazines that are available in this format into the system, the OCR software examines them page by page. Then, Paperwork shows the scaled-down individual pages in a list view in the right windowpane, and the user can magnify these on request.
For print media, which often contains many illustrations and advertisements, it is advisable to turn off predictive text in the Setup menu before the OCR step, which not only saves a significant amount of time, it also avoids unusable results, such as those often encountered if the originals use different colors and font sizes.
Indexing and Searching
Once all of your originals have been scanned or loaded, users can assign keywords. To do so, click on the scaled-down document in the program window's left pane and then press the small icon in the right pane. After clicking on the + button, you can define different labels, or keywords. For a better overview, you can assign colors to the labels (Figure 9).
Paperwork always assigns all existing reference terms to documents that you are keywording. To disable the incorrect tags in your document, uncheck and simultaneously enter new ones using the Additional keywords input field. After making all specifications, reopen the list display by pressing the back arrow in the Properties view. You should now see color indexing for each document.
To find specific data in extensive documents, click on the Advanced search button top left in the document and then enter your criteria in a very simple search dialog. You can combine the groups Date, Keyword, and Keyword(s) as needed. Clicking on Apply updates the display and lists only the hits that match all search criteria (Figure 10).
An export function is available from the menu button at the top right. You can export the current documents or pages to PNG, JPEG, or PDF files and store them in a directory of your choice. You can also print the current document using the corresponding entry in the same menu.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.