Indexing and searching text with Lucene

SmartSearch

Article from Issue 150/2013

Author(s): Carsten Schnober

Even state-of-the-art computers need to use clever methods to process ever-increasing amounts of document data. The open source Lucene framework uses inverted indexing for fast searches of document collections.

Nowadays, almost any commercially available hard drive can store more text than a whole library. In the digital world, a traditional system such as a card catalog or a knowledgeable librarian is no longer adequate to help find the right shelf. Even software equivalents such as find or zgrep are not always fast enough to track a particular piece of information amongst giga- or terabytes of data.

The science that deals with this type of search problem is called information retrieval. Computer scientists have developed sophisticated methods for tracking down files that users don't even know exist. The free Java library Lucene [1] implements some of these methods. Doug Cutting published an early version of Lucene in 1999. Two years later, the project, which carries the middle name of Cutting's wife, came under the auspices of the Apache Foundation when it joined the Apache Jakarta Project.

Lucene has been available in Version 4.0 since October 2012. The index file structures are backward compatible, so the transition from 3.6 to 4.0 does not cause any problems. Over the years, Lucene has become one of the most widely used solutions for indexing and searching text. (See the box titled "Lucene In All Its Facets.")

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Download Article PDF now with Express Checkout

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subscriptions

Digital Subscriptions

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Yet Another Linux Kernel Vulnerability Discovered

Kernel

Affecting millions of systems, a kernel flaw discovered by Qualys could allow users to gain root privileges.
Ubuntu 26.10 to Include Ubuntu Certified Hardware Check

Ubuntu

If you've ever wondered if your laptop or PC is officially certified to run Ubuntu, that curiosity will soon be met.
Substantial Update to IPFire Now Available

The lastest version of IPFire features a fundamental change to how the system handles DNS.
Gnome Working on Test Center App to Make Testing Easier

Gnome , Linux

It's now possible to test experimental features on the Gnome desktop without worrying that you'll break things.
New Vulnerability Discovered in Linux Kernel

Artificial Inte... , Kernel , vulnerability

Hiding out for nearly 15 years, the Ghostlock vulnerability allows a standard logged-in user to gain root privileges.
New Linux Flaw Lets Attackers Escape VMs

RHEL , Security , vulnerability

A 16-year-old vulnerability allows an attacker to escape a virtual machine, gain access to the host, and execute malicious code.
Hannah Montana Linux Is Back!

DEBIAN , Kubuntu , Plasma

Developer Noah Cagle decided the world needed the once obscure but beloved Linux distribution and gave it a decidedly pink refresh.
System76 Refreshes the Lemur Laptop

Hardware , laptop

If you're looking for a laptop with tons of power and battery, look no further than the latest iteration of the System76 Lemur Pro.
More than 43 Million Lines of Code in Linux Kernel 7.2

Kernel , Linux

Using the cloc utility, Michael Larabel of Phoronix discovered that Linux kernel 7.2 has over 43 million lines of code.
Kubuntu Focus Goes Ultra

Hardware , Kubuntu , laptop

The Kubuntu Focus team has upped the performance ante of its M2 and Zr laptops with the latest, greatest CPUs from Intel.

Indexing and searching text with Lucene

SmartSearch

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Yet Another Linux Kernel Vulnerability Discovered

Ubuntu 26.10 to Include Ubuntu Certified Hardware Check

Substantial Update to IPFire Now Available

Gnome Working on Test Center App to Make Testing Easier

New Vulnerability Discovered in Linux Kernel

New Linux Flaw Lets Attackers Escape VMs

Hannah Montana Linux Is Back!

System76 Refreshes the Lemur Laptop

More than 43 Million Lines of Code in Linux Kernel 7.2

Kubuntu Focus Goes Ultra

Indexing and searching text with Lucene

SmartSearch

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters