Close Search
Tutorials – Recoll
Even in the age of cloud computing, personal computers often hold thousands of files: text files, spreadsheets, word processing docs, configuration files, and HTML files, as well as email and other message formats. If it takes too long to find the file you need, chase it down with the Recoll local search engine.
Recoll [1] is free software for Linux and Windows systems that adds a local search engine to your desktop or local network. And if you think that desktop search engines don't make sense in this age of cloud computing, I beg to disagree!
Look inside any school, NGO, small/medium enterprise, or individual computer used for more than a few years: Almost always, you will find big archives of mostly textual content that will never be uploaded in the cloud or otherwise exposed to an online search engine. Sometimes the reason is mere lack of time, bandwidth, or money. Sometimes it is privacy. Sometimes the reason is easier compliance with regulations like the European General Data Protection Regulation (GDPR) [2]. In all cases, deploying local search capability could make thousands and thousands of files much more useful for their owners.
Recoll is an excellent answer to the need for a local search engine. The Recoll search tool offers flexible interfaces, good documentation, and easy installation. Thanks to a relatively simple search language, Recoll can analyze and index text inside all the most common document formats, even when those documents are "hidden" inside other files (for example, an OpenDocument file zipped and attached to an email message). In most cases, you can preview or open the files found with your search by just clicking on them inside the Recoll window.
The first part of this tutorial explains how Recoll works and how to install it and configure its most critical functions. The second part describes the Recoll search syntax and offers a few tips to help you make the best use of Recoll.
Architecture and User Interfaces
Strictly speaking, Recoll is just a wrapper, albeit a great one, for the open source information retrieval library called Xapian [3]. It is Xapian, not Recoll, that performs all the high-level indexing and classification of your documents. Xapian is also directly usable via scripts in Perl, Java, and other languages. But it is Recoll that makes the desktop search really usable, by doing all the rest of the work, from overall configuration to obscure, low-level tasks like stemming. Stemming is the process of reducing similar words to their common root. It is thanks to stemming that you can search for a word like "hacker" and receive results for "hackerS" or "hacking" in addition to the original search term.
The other tasks that Recoll handles directly are extracting text from your files, decoding your queries and, of course, presenting their results in a format that makes it easy to browse and open them from a graphical interface.
With the right libraries and plugins, you can perform Recoll searches directly from Python and other languages, or from desktop environments like Unity or KDE. This article will focus on the native Recoll GUI, its web-based equivalent, and, of course, the evergreen command-line option.
Installation
Recent Recoll versions are available as binary packages for Windows and the most popular Linux distributions. On Ubuntu, for example, type the following commands at the prompt to add a Personal Package Archive (PPA) repository for recoll and install both the graphical and command-line interfaces:
#> sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on #> sudo apt-get update #> sudo apt-get install recoll -y
(Don't be fooled by the 1.15
in the repository name: The command will install the current version of Recoll, whatever it is). After those commands, typing recoll
in the desktop search bar will show you the icon that opens the Recoll native GUI. To search at the command prompt or in a shell script, use the command recollq
. Use the recollindex
command to generate an index.
You must install the Recoll web interface separately. Go to the Github page for the web interface [4], download the master.zip
archive for your version of Recoll, and unzip it to expand a folder called recoll-webui-master
. The file inside the folder called webui-standalone.py
is a mini web server, which you can reach with your browser at the address http://localhost:8080. The mini web server is a bit slow, but it works right away for all the users of the local network, with one (well documented) caveat: You cannot directly open local files from the links in its listings unless you explicitly authorize Firefox to do so (see the box entitled "Authorizing Firefox").
Authorizing Firefox
To authorize Firefox to let you open local files, add the contents of the file examples/firefox-user.js
into ~/.mozilla/firefox/<profile>/user.js
and restart Firefox.
If you plan to use Recoll on a regular basis, you might wish to configure your Linux system to start it automatically when the system boots. See your Linux distro's documentation for more on configuring an application to launch at system startup.
Indexing Configuration
No search engine is better than its index. Telling Recoll how to create and maintain the index is the most critical part of the configuration (Figure 1).
Recoll has a system-wide configuration file (/usr/share/recoll/examples/recoll.conf
on Ubuntu), but each user also gets a personal configuration – with a higher priority. The personal configuration file is stored in $HOME/.recoll/recoll.conf
. The first time you start it, the Recoll GUI will ask you how to configure the index and will save your choices in your personal file. Among other things, you may define which files types should be indexed and the default language.
By default, Recoll will only have one index for your whole home directory, but it may handle many, totally independent indexes. The only requirement is that each index has a dedicated configuration directory. The simplest way to make Recoll create a separate configuration and index seems to be to create an empty directory and then start the software from the command line with the -c
option pointing to it:
#>mkdir $HOME/.recoll-customconfig #>recoll -c $HOME/.recoll-customconfig
You can search in more indexes simultaneously by adding them in the Preferences | External Index Dialog of the GUI. Don't forget that, when you search on multiple indexes, Recoll will use all their data, but it will only use the configuration of one index: the default index, or the index explicitly set with the RECOLL_CONFDIR
environment variable or the -c
option.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.