Rewriting a photo tagger in Go

Programming Snapshot – Go Photo Tagger

© Lead Image © Erik Reis, 123RF.com

© Lead Image © Erik Reis, 123RF.com

Article from Issue 264/2022
Author(s):

In honor of the 25th anniversary of his Programming Snapshot column, Mike Schilli revisits an old problem and solves it with Go instead of Perl.

Hurray! This issue marks the 25th anniversary of my "Programming Snapshot" column, which first appeared in the German edition of Linux Magazine back in October 1997 (originally under the "Perl Snapshot" banner). Times have changed: Now the featured programs in this column mainly use Go, but you might also see Ruby, Python, or even TeX, as was the case recently.

For this dinosaur birthday party, I thought I might rewrite a tool I put together in Perl back in the dot-com era, but looking at it from today's perspective in Go. The photo tagger from 2003 (it was called Image Database [1] or idb for short) is something I've been wanting to use again for a long time.

The idb tool assigns one or more tags to a set of photo files, distributed over arbitrary subdirectories somewhere on the hard drive. Once tagged with the tool, the same program can retrieve the photos if you provide the name of the desired tag. The problem with the old Perl code, though, is that you need both the time and the inclination to go through the installation and dependency hell of all the Perl modules used by it. Moreover, many years have passed since then, and some CPAN module developers have broken backward compatibility by changing the original programming interfaces. Luckily, it's 2022, and Go has solved these kinds of installation problems for all time, as you can compile static binaries that run on similar architectures.

Also, the old tagger script used a separate standalone MySQL server back in the day, but today – at least for tools that only run locally – I prefer to have everything bundled into a single binary, such as an embedded SQLite flat file database engine. Somewhat surprisingly, the rewrite with newfangled technology was remarkably quick.

SQLosaurus Rex

SQL databases are a bit out of fashion these days. If you only need a key-value store for your data, you are more likely to use a persistent cache or a server solution like Redis. However, for local data not exceeding a few megabytes, running an external process is unlikely to be worthwhile. Also, I tend to be suspicious of binary data in caches or key-value stores such as Berkeley DB. Instead, I prefer to take a direct look at the data myself from time to time. SQLite is an ideal database because it stores the data in a single file that a command-line tool such as sqlite3 lets you browse. Plus, backing up a single file is easier than creating and backing up a dump of a running database.

On top of this, SQLite is one of the few open source tools that is truly in the public domain. This is why there's a Go module on GitHub like mattn/go-sqlite3 that lets you legally include the SQLite source code in any program you write and distribute. The Go compiler then turns SQLite, the library, and the application code into a single binary that can be copied to other computers with a similar architecture and will run there without any installation hassles. It's the end of dependency hell as we know it – I never thought I'd get to see that! For the installation at least, recompiling legacy code can be a different story and subject to problems arising from non-backward-compatible changes.

Three Tables

So which relational data model is suitable for a photo tagger application? The idb tool assigns one or more tags to one or more files. Since the Stone Age of data processing, the three-table model has proven useful for many-to-many relations like this: two tables to assign index numbers to tag names and file paths, and then a third, two-column table that maps the index numbers to each other if a particular tag is attached to a particular file.

In this way, the database only needs to store the full tag or file name once in each case, a basic requirement for a normalized database. This has advantages beyond wasted disk space due to duplicate storage. Moreover, if the user corrects a typo in a tag, the database only has to correct it in one place, even if the tag is attached to thousands of files.

For example, to tag the dsc13.jpg photo file with the surfing tag (Figure 1), the tool first creates a new entry for the surfing tag in the tag table (on the left of Figure 1) if the entry does not already exist. SQLite automatically assigns the associated sequential index number, 2 in this case, to the entry because entries start at an index of   and surfing is the third entry in the name column. In addition, the file name dsc13.jpg, if not already present, needs to be inserted into the file table – in Figure 1 it ends up in the third row and has an index number of 2 (again, an ascending index starting at  ).

Figure 1: The SQL database schema for the photo tagger.

That takes care of the two lookup tables for tags and file names. Now you need the actual assignment of the tag to the photo. This is handled by an entry in the tag map table (center, Figure 1), which assigns a tag ID of 2 to the file ID 2. All done! Using typical SQL joins, it is then easy for the database to respond to the question as to which photos were tagged with surfing. An SQL query to this effect quickly yields dsc13.jpg and possibly others. In the opposite case, the query engine can also easily discover which tags are attached to the dsc13.jpg image file, again by joining the tables.

Homemade

The finished idb binary, linked together from the Go sources for this article, can carry out the commands listed in Table 1. The binary supports tagging files, searching for files with a specific tag, and listing all tags assigned so far. As a special treat, the --xlink option generates a directory full of symlinks pointing to the original photos for files found for a given tag. With a tool such as iNuke [2], featured in a recent column, the photos can then be viewed, and the best ones selected.

Table 1

Commands

idb --tag=foo image.jpg …

Tag photos with foo

idb --tag=foo

Find photos with the foo tag

idb --tag=foo --xlink

Find photos with the foo tag and create a local symlink

idb --tags

List all tags

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Patterns in the Archive

    To help him check his Google Drive files with three different pattern matchers, Mike builds a command-line tool in Go to maintain a meta cache.

  • Treasure Hunt

    A geolocation guessing game based on the popular Wordle evaluates a player's guesses based on the distance from and direction to the target location. Mike Schilli turns this concept into a desktop game in Go using the photos from his private collection.

  • Programming Snapshot – Go Photo Organizer

    In this issue, Mike conjures up a Go program to copy photos from a cell phone or SD card into a date-based file structure on a Linux box. To avoid wasting time, a cache using UUIDs ensures that only new photos are transferred.

  • Programming Snapshot – Go

    To find files quickly in the deeply nested subdirectories of his home directory, Mike whips up a Go program to index file metadata in an SQLite database.

  • SQLite Tutorial

    Several databases likely reside on your desktop and smartphone, and it is easy to manage the data in these files or to create similar databases yourself.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News