A new semantic search engine for the KDE desktop
Loaded for Bear
Baloo replaces Nepomuk as the semantic search engine on the KDE desktop, but it gets off to a bumpy start.
The Nepomuk semantic search has been highly controversial since its introduction in KDE – both for users and for application developers. The release of KDE Applications 4.13 brings a new solution named Baloo to the KDE desktop – but not without some growing pains.
Desktop environments are pretty dumb, as users discover when they look for a file whose name they simply cannot remember. Semantic desktops were designed to address this problem. A semantic desktop has access to information about data, such as what data manifested, when data was created, and how data relate to each other. Using this information, a user can, for example, search for a file on the hard disk that John Doe emailed in March.
The KDE developers wanted to give their desktop environment an appropriate semantic search engine. They turned to the results of the Nepomuk (Networked Environment for Personalized, Ontology-Based Management of Unified Knowledge; Figure 1) research project [1] funded by the European Union from 2006 to 2008 to the tune of several million euros . It was intended to provide everything necessary to promote and simplify the development of a semantic desktop.
RDF and Other Calamities
Nepomuk required the RDF (Resource Description Framework) to describe and store relations [2]. The implementation of Nepomuk in KDE collects information about stored data from all KDE applications, links and processes this information, and serves it up to the search function.
Since the introduction of Nepomuk in KDE 4, many users repeatedly complained about its poor performance and lack of stability. Application developers, in turn, found the API too complicated and called for extra features. Although the KDE developers made improvements over the years, many users continue to disable Nepomuk. Especially in interactions with Akonadi, the underpinnings of the PIM programs, Nepomuk generated excessive load.
Virtuoso, Vishuoso, and a Restart
The KDE developers identified the Virtuoso database, which ran in the background and which Nepomuk used to store its RDF data, as the major obstacle to better performance. Although Virtuoso itself worked pretty fast, it hogged extremely large amounts of memory. The KDE developers therefore started to implement their own RDF store named Vishuoso [3]. However, they continually stumbled over the requirements of the EU research project. They were partly vague, incomplete, and sometimes even redundant [4].
For these reasons, the KDE developers decided to take a radical step: They ditched RDF and the old Nepomuk and developed a successor named Baloo [5]. In contrast to Nepomuk, Baloo, which is named after a character in The Jungle Book (think "Bear Necessities"), is designed to be more frugal with resources, be more reliable, and deliver superior results faster. Internally, Baloo is modular, so in future, it will be easier to add new features and improve existing ones. Additionally, the design was overhauled with the intention of preventing failures.
The KDE developers have not completely rewritten Baloo; rather, they have recycled parts of the code. Thus, they refer to Baloo as the "next generation of Nepomuk search." The changes to the programming interface are thus manageable; application developers should find it relatively easy to adapt their software.
Relations and Stores
Baloo manages relations between two "uniquely identifiable identifiers." A file has the unique identifier file:<x>
, whereas Akonadi creates a unique identifier of the form akonadi:?item=<x>
for most PIM data.
Each relation ends up in a separate and appropriate data store. In the simplest case, this is a table with two columns in a SQLite database. Baloo deliberately does not require a specific storage format or data store API. The advantage of this setup is that the data store can be tailored to match the information to be stored, thus letting Baloo store the data in the best possible way.
The search itself is performed by the search stores. Each search store only takes care of certain data. For example, one exclusively searches for files (File Search), another searches in email (Email Search), and third searches in contacts (Contact Search). Each of these search stores provide a specific API through which the search can be triggered; however, they can also provide additional APIs.
Currently, Baloo only manages files and comes with a matching search store and data store. The collected data is stored in a SQLite database. The search is additionally supported by the Xapian software, which indexes the collected dataset. Akonadi already stores all its PIM data itself; the search in this dataset is handled by search stores for contacts and email [6]. Thanks to this new architecture with individual data and search stores, Baloo is unlikely to require too much memory and should deliver matching search results extremely quickly.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.