Document management for the small office
Defying Chaos
![© Lead Image © alphaspirit, 123RF.com © Lead Image © alphaspirit, 123RF.com](/var/linux_magazin/storage/images/issues/2017/202/document-management/123rf_64803690_paddling-boat-documents_alphaspirit_resized.png/710216-1-eng-US/123RF_64803690_paddling-boat-documents_alphaspirit_resized.png_medium.png)
© Lead Image © alphaspirit, 123RF.com
Even in a small office, countless letters, email messages, and PDFs arrive daily. Document management systems help you avoid drowning in the flood of documents.
It's been more than a decade since the proclamation of the paperless office, with special document management systems (DMSs) proposed as the tool to manage arbitrary documents without miles of shelving. DMSs typically operate as client-server applications that users can access by means of a database back end.
Most of these DMS applications are at home in medium to large enterprises and are hopelessly oversized for use in small home offices. Successfully using a DMS becomes even more difficult when the requirements include Linux support. Nevertheless, I searched for DMSs for Linux workstations that relieve the strain on small offices without time-consuming training and permanent maintenance. In my search, I've taken a look at Krystal DMS, LogicalDOC, Paperwork, and Referencer (see also the "Not Tested" box).
Not Tested
OpenKM [1] was intended to be the fifth candidate in this test. Although it has a Linux version – including a community release and commercial and cloud packages – in our lab, the software proved to be extremely recalcitrant, with no usable installation routines for small offices or for less savvy admins, as well as no current documentation. Instead, you are expected to install the required packages manually, individually, and separately (including a Tomcat application server, a MySQL database, and applications such as ImageMagick and Ghostscript), followed by editing of complex configuration files – again by hand.
Although the manufacturer provides help documents, they are hopelessly out of date and caused attempted installations on current Linux distributions to fail. Some recent Linux versions also no longer offer the required packages. For Fedora and Red Hat Linux, the documentation refers to OpenOffice Suite 3.1.1, which was released August 31, 2009, and has seen countless new releases in the meantime.
The Debian and Ubuntu documentation also is out of date: It describes the configuration for the long-since-replaced SysVinit system but does not tell you how to handle the service units of the current systemd session manager. The Apache web server configuration no longer works as described, either. For all of these reasons, I did not test OpenKM for this article.
Requirements
Ideally, the DMS should reproduce the workflow of a document starting with its creation, through its entire lifecycle, to final deletion. The DMS should handle not only printed documents, but also files that exist electronically in various formats (e.g., email).
The DMS does not just act as an archiving system for quick access to archived documents using keywords, date stamps, or other attributes. It also needs to optimize the flow of information in organizations by introducing distribution mechanisms for eligible recipients, document linking, and access monitoring.
A modular design should also ensure trouble-free processing of documents in third-party applications, including popular office suites or Enterprise Content Management (ECM) systems.
Multiplatform capability to allow the use of the client on mobile devices like tablets is also becoming increasingly important. Today, this also includes cloud connections for access to documents in the DMS independently of stationary IT. Last, but not least, regulatory requirements for archiving also need to be met wherever you are in the world.
In the Small Office
Small offices do not typically require large DMSs that are usually difficult to install and configure and require regular maintenance on top. However, alternatives for small offices also need to handle input sources, such as printed documents, files of different formats, and stored email. Ideally, they should also include a scan engine that enables reading and text recognition of printed originals. Keywording and other storage functions are in the DMS's domain, as well as interfaces for the major office suites (see Table 1).
Table 1
Overview DMS Functions
| Krystal DMS | LogicalDOC | Paperwork | Referencer |
---|---|---|---|---|
Modular design |
Yes |
Yes |
Yes |
No |
Localization |
Yes* |
Yes |
Yes |
Yes |
Client-server architecture |
Yes |
Yes |
No |
No |
Web-based interface |
Yes |
Yes |
No |
No |
Scanning module |
Yes* |
Yes* |
Yes |
No |
Multiple sheet scanning |
Yes* |
Yes* |
Yes |
No |
OCR module |
Yes (external) |
Yes (external) |
Yes |
No |
Import function |
Yes |
Yes |
Yes |
Yes |
Export function |
Yes* |
Yes |
Yes |
Yes (external) |
Viewer |
Yes |
Yes |
Yes |
No |
Indexing and searching |
Yes |
Yes |
Yes |
Yes |
Version history |
Yes |
Yes |
No |
No |
Comments |
No |
Yes |
No |
Yes |
Cloud connection |
No |
Yes |
No |
Yes |
Mobile apps |
Yes |
Yes |
No |
No |
Link to CMS systems |
No |
Yes |
No |
No |
*Available only in the commercial versions. |
Less relevant in small DMS solutions, however, is sophisticated mechanisms for granting rights and modules for interacting with major league ERP and ECM solutions. Also, the ability to use an app to access the DMS software from a mobile device, such as a tablet or smartphone, is less important in this working environment. What proves to be as important in the service portfolio solution for small offices and individual workstations, however, is easy installation and configuration of the software.
The Trouble with OCR
Reliable detection of scanned originals remains problematic on Linux. If the DMS applications do not have their own OCR modules, users are forced in many cases to rely on third-party solutions. In a Linux Magazine lab, we tested an OCR team consisting of Tesseract and gImageReader. The solution turned out to be technologically mature and therefore usable (see the "Tesseract and gImageReader" box).
Tesseract and gImageReader
Hewlett-Packard (HP) worked on the Tesseract [2] text recognition engine between 1985 and 1995. For 10 years, development lay dormant because HP had abandoned this market segment. In 2005, Google acquired the software and, after revising the code, released it to the developer community as free software under the Apache license. Subsequently Tesseract spread throughout the Linux universe. Thanks to the modular design, Tesseract is also multilingual, and even German blackletter types are now detected if you have the matching modules in place. Not even foreign languages with many nonstandard characters can pose unsolvable problems for the software.
Because OCR engines are typically command-line-only applications, third parties have developed various graphical interfaces over the years to make the programs easier to use. The GUI environments often cover one or several special engines.
gImageReader [3] has established itself as a relatively unknown front end for Tesseract OCR. In addition to ease of use, it promises a particularly lean design and therefore comes without unnecessary bells and whistles. Both software packages are available in software repositories of the popular Linux distributions. You can thus install at the push of a button on your flavor of Linux, then simply call the graphical front end, which automatically launches the OCR engine in the background, so you can scan originals and launch the recognition process (Figure 1).
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
![Learn More](https://www.linux-magazine.com/var/linux_magazin/storage/images/media/linux-magazine-eng-us/images/misc/learn-more/834592-1-eng-US/Learn-More_medium.png)
News
-
NVIDIA Released Driver for Upcoming NVIDIA 560 GPU for Linux
Not only has NVIDIA released the driver for its upcoming CPU series, it's the first release that defaults to using open-source GPU kernel modules.
-
OpenMandriva Lx 24.07 Released
If you’re into rolling release Linux distributions, OpenMandriva ROME has a new snapshot with a new kernel.
-
Kernel 6.10 Available for General Usage
Linus Torvalds has released the 6.10 kernel and it includes significant performance increases for Intel Core hybrid systems and more.
-
TUXEDO Computers Releases InfinityBook Pro 14 Gen9 Laptop
Sporting either AMD or Intel CPUs, the TUXEDO InfinityBook Pro 14 is an extremely compact, lightweight, sturdy powerhouse.
-
Google Extends Support for Linux Kernels Used for Android
Because the LTS Linux kernel releases are so important to Android, Google has decided to extend the support period beyond that offered by the kernel development team.
-
Linux Mint 22 Stable Delayed
If you're anxious about getting your hands on the stable release of Linux Mint 22, it looks as if you're going to have to wait a bit longer.
-
Nitrux 3.5.1 Available for Install
The latest version of the immutable, systemd-free distribution includes an updated kernel and NVIDIA driver.
-
Debian 12.6 Released with Plenty of Bug Fixes and Updates
The sixth update to Debian "Bookworm" is all about security mitigations and making adjustments for some "serious problems."
-
Canonical Offers 12-Year LTS for Open Source Docker Images
Canonical is expanding its LTS offering to reach beyond the DEB packages with a new distro-less Docker image.
-
Plasma Desktop 6.1 Released with Several Enhancements
If you're a fan of Plasma Desktop, you should be excited about this new point release.