Finding problems using unsupervised image categorization
Needle in a Haystack

The most tedious part of supervised machine learning is providing sufficient supervision. However, if the samples come from a restricted sample space, unsupervised learning might be fine for the task.
In any classification project, it is certainly possible to get someone to review a certain number of images and build a classification list. However, when entering a new domain, it can be difficult to identify domain knowledge experts or to develop a ground truth for classification upon which all experts can agree. This is true regardless of whether you are looking at the backside of a silicon wafer for the first time, or if you are trying to identify the presence of volcanoes in radar images from the surface of Venus [1].
Alternatively, you can bypass all these problems and kick-start a classification project with unsupervised machine learning. Unsupervised machine learning is particularly applicable to environments where the typical images are largely identical, much like the pieces of hay in the haystack that you need to ignore when looking for needles.
In this article, I examine the potential for using unsupervised machine learning in Python (version 3.8.3 64-bit) to identify image categories for a restricted image space without resorting to training neural networks. This technique follows from the long tradition within engineering of finding the simplest solution to a problem. In this particular case, the solution relies upon the ability of the functions within the OpenCV and mahotas computer vision libraries to generate parameters for the texture and form within an image.
[...]
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
openSUSE Joins End of 10
openSUSE has decided to not only join the End of 10 movement but it also will no longer support the Deepin Desktop Environment.
-
New Version of Flatpak Released
Flatpak 1.16.1 is now available as the latest, stable version with various improvements.
-
IBM Announces Powerhouse Linux Server
IBM has unleashed a seriously powerful Linux server with the LinuxONE Emperor 5.
-
Plasma Ends LTS Releases
The KDE Plasma development team is doing away with the LTS releases for a good reason.
-
Arch Linux Available for Windows Subsystem for Linux
If you've ever wanted to use a rolling release distribution with WSL, now's your chance.
-
System76 Releases COSMIC Alpha 7
With scores of bug fixes and a really cool workspaces feature, COSMIC is looking to soon migrate from alpha to beta.
-
OpenMandriva Lx 6.0 Available for Installation
The latest release of OpenMandriva has arrived with a new kernel, an updated Plasma desktop, and a server edition.
-
TrueNAS 25.04 Arrives with Thousands of Changes
One of the most popular Linux-based NAS solutions has rolled out the latest edition, based on Ubuntu 25.04.
-
Fedora 42 Available with Two New Spins
The latest release from the Fedora Project includes the usual updates, a new kernel, an official KDE Plasma spin, and a new System76 spin.
-
So Long, ArcoLinux
The ArcoLinux distribution is the latest Linux distribution to shut down.