Big Data search engine for full-text strings and photos with radius search

Nice Scenery

The find() function also recursively digs through subdirectories. For the search engine to store the geodata in a way that optimizes the query performance, I need to add a mappings directive: The create() command as of line 15 defines a geo_point property by the name of Location for the photo document type used in the photos index. The documentation for this [8] is out of date, by the way; the mapping it describes no longer works. I have, however, successfully tested Listing 4 with Elasticsearch release 1.0.0 RC2.

Starting with the jpeg images found by the search, line 32 in Listing 4 uses the IPhonePicGeo module to extract the geodata and pushes it, along with the file names, into the elastic database in the body section of the index() method starting in line 35.

After the data of all the photos has been indexed in this way, the script in Listing 5 retrieves all the snapshots that I took within 1km of the reference photo passed in at the command line. For this purpose, it ascertains the geodetic information of the reference image and then sends a match_all() query, which returns all stored images. Line 23 turns on a filter that limits the geo_distance to 1km. Additionally, the size parameter increases the maximum number of hits to 100.

Listing 5

photo-gps-match

 

This returns a list of photo objects, of which line 37 extracts the original file name and pushes it to the end of the array @files. Finally, the system() function in line 40 calls eog (The Eye of Gnome application), which displays all the results as thumbnails (Figure 5). You can now click your way through them to explore the vicinity.

Figure 5: In a 1km radius of a reference photo of the Bay Bridge, Elasticsearch finds more pictures of the Bay Bridge, served up by the Eye of Gnome application as thumbnails.

No Limits

The geo-function is just one of many plugin-like extensions of the Elasticsearch server, a useful tool that is easy to install and operate. It also scales practically infinitely because, as the volume of data increases, the administrator can distribute the indexes to a sufficiently large number of other Apache Lucene shards, to again run all queries with the required level of performance.

Books on paper and electronic form exist for Elasticsearch, but unfortunately, I can't really recommend any of them. That said, however, the tutorial [10] can be a help, and volunteers will answer questions on Stackoverflow.com.

Mike Schilli

Mike Schilli works as a software engineer with Yahoo! in Sunnyvale, California. He can be contacted at mschilli@perlmeister.com. Mike's homepage can be found at http://perlmeister.com.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Perl – Elasticsearch

    Websites often offer readers links to articles about similar topics. Using Elasticsearch, the free search engine, is one way to find related documents instantly and automatically.

  • ELK Stack

    A powerful search engine, a tool for processing and normalizing protocols, and another for visualizing the results – Elasticsearch, Logstash, and Kibana form the ELK stack, which helps admins manage logfiles on high-volume systems.

  • Logstash

    When something goes wrong on a system, the logfile is the first place to look for troubleshooting clues. Logstash, a log server with built-in analysis tools, consolidates logs from many servers and even makes the data searchable.

  • Beagle

    To find files, music, messages, and photos in a single search, try this desktop tool with the power of an Internet search engine.

  • Tutorials – Recoll

    Even in the age of cloud computing, personal computers often hold thousands of files: text files, spreadsheets, word processing docs, configuration files, and HTML files, as well as email and other message formats. If it takes too long to find the file you need, chase it down with the Recoll local search engine.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News