Full-text search with Solr, Xapian, and Sphinx
Full-text search engines like Solr, Xapian, and Sphinx make the daily data chaos on your hard disk searchable – and they even cooperate with relational databases.
Creating a list of 10 websites that discuss the latest Ubuntu release is simple: just use Google or another one of the popular web search engines. But if you host an information-packed website yourself and want to offer your own search function for it, you need a full-text search tool. Full-text search engines have other benefits for the user and developer. If you are building a custom application or DVD, for instance, you might want to include a full-text search tool to put important information at the user's fingertips. Full-text search delves the depths of random or systematically arranged data for one or more search terms. You will want the search results sorted by relevance, and you will want the results in a split second.
Luckily, admins and developers need not reinvent the wheel: Solr, Xapian, and Sphinx are open source projects that index and analyze data. But how do you define data? You can roughly distinguish two states in which the search engines find information: structured and unstructured.
Structured data has a fixed, predefined structure that allows it to be easily recognized, categorized, and processed with the help of applications. The most common form of structured data is a relational database, with data organized in rows and columns that, in turn, are connected in the form of tables. In contrast to this, unstructured data lacks a data model. Such data sets are often so ambiguous that a program cannot simply process them because the data, facts, and figures are totally mixed. Unstructured data is the domain of search engines that can at least arrange the chaotic data semantically.
Read full article as PDF:
Version 16 of the popular Linux desktop reveals new tools, edge-snapping, and performance improvements.
Symantec says Linux-Darlioz burrows in through PHP.
Dell renews its quest for the ultimate developer machine.
Innovative back door looks like normal SSH traffic.
One of CeBITs most successful forums opens the new year with a new name. The popular Open Source Forum continues in 2014 under the name Special Conference: Open Source. This year, the forum will be bigger and offer a wider range of possibilities for sponsors.
New release offers better graphics drivers and expands filesystem support.
New mail protocol will shut out the NSA and prevent snooping on metadata.
A new web application helps users visualize distributed denial-of-service attacks.
Ubuntu 13.10 takes a step toward convergence, with lots of mobility, but Mir only partly here.
Galileo board is targeted to embedded developers and educational institutions.