Living with Statistics
Off the Beat: Bruce Byfield's Blog
One of the prices of software freedom is the impossibility of getting accurate figures for usage. As a user, I consider that a small price to pay for not having to register or activate software. However, as a journalist I'm often frustrated, because accurate figures can be useful for establishing a point or debunking rumors.
The questions for which I would like accurate stats include: how many GNU/Linux users are there? Has Linux Mint really overtaken Ubuntu as the most popular distribution? Has GNOME gained or lost users with the start of its third release series? All these questions and more would benefit from reliable figures, yet we don't have any. Instead, we have a series of indicators that are approximate at best, and completely unreliable at worst.
One problem is external biases. For example, when NetApplications places Linux usage at 1.6%, that total is derived "from the browsers of site visitors to our exclusive on-demand network of live stats customers." But when I consider that the same methodology based on visits to my personal blog would suggest a figure of 19% for Linux, I have to wonder if NetApplications' figures aren't as skewed as mine, but in the opposite direction.
Similarly, since NetApplications' headquarters are in California, probably American companies are most likely to use its services. Unofficially, I am always told that free software usage is lightest in North America, Microsoft's home, and higher in Europe or in developing countries.
However, other problems arise when I rely on sources that are more friendly to free software, such as Distrowatch's page views for distributions. My guess is that most people who visit Distrowatch are already familiar with free and open source software (FOSS), so that their figures reflect only reflect the tastes of relatively experienced users.
Yet even that assumption may be questionable. Page views might tell what distributions people are curious about, but that might be a rough indicator of what people are downloading and using.
Moreover, Distrowatch's numbers are small enough that a new release or a lively discussion elsewhere online can skew results for days or weeks at a time. A handful of fans might easily distort results, although nothing indicates that such an effort has ever been made. Armed with such doubts, you can easily dismiss Distrowatch figures altogether, as Canonical employee Michael Hall did when Distrowatch reported Linux Mint as receiving more views than Ubuntu.
User surveys share some of the problems of Distrowatch's figures, but also come with their own problems. For instance, FLOSSPOLS' survey of gender in the community frames all discussions of women's under-representation in FOSS. Yet the FLOSSPOLS data was collected seven to eight years ago, making it decidely obsolete, especially in a field that changes as rapidly as FOSS. Today, we have no idea whether the situation in the community is better than the survey reports (it could hardly be worse).
Still, at least the FLOSSPOLS survey was designed according to research standards. Community surveys, such as the Linux Journal's Readers' Choice Awards or the LinuxQuestions' Members Choice Awards can't even claim that. In both, participants are self-selected and answers are open ended. The number of participants may or may not be given, and margins of errors never -- although, if they were, they might be as high as five percent. If so, then in many cases where GNOME was declared the most popular desktop environment over KDE, or Mozilla the most popular web browser over Chrome, a more accurate result would probably be to declare a tie.
None of what I am saying is meant to be a reflection upon those who collect the data. With the exception of FLOSSPOLS and NetApplications, none of these sources has ever claimed to be providing scientifically reliable information. In some cases, entertainment is probably more of a motivation than anything else.
But for those of us in search of accurate information, the shortcomings of what is available are annoying, to say the least.
Living with Imperfection
So what's a writer to do? The high road would be to ignore such sources of information, and learn to live with uncertainty. As much as I want accurate information about FOSS, I might have to accept that it just doesn't exist.
However, that is hardly a solution. Even if I ignore these figures, others don't. Such sources as are available always being cited to support various arguments, and, if nothing else, I might want to debunk the argument with something more than the reasonable doubt of meta-arguments.
Besides, the issues that such sources touch upon are ones that I -- and many other people -- want to talk about. As limited as these information sources maybe, they at least give some context to discussions that would otherwise be even less uninformed.
As a result, the way I use these figures is an uneasy compromise. However, briefly, I try to indicate that they're not reliable. I try not to make arguments that depend on a couple of percentage points of difference.
Most of all, I try not to base an argument on any single set of results. If a survey gets the same results several years running, I'm more likely to trust the figures than if they appear in a single year. Better yet are times when more than one source shows similar results over several years.
Of course, if I was paranoid enough, I might worry about whether all surveys were being manipulated by a small group of users or corporate employees. Realistically, though, I think that, under the conditions I describe these statistical sources can indicate general trends to a degree that no other sources of information can. But I try not to forget that these sources are tentative, and can never be used with any precision.
Comments
comments powered by DisqusSubscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.
-
Fedora 41 Released with New Features
If you're a Fedora fan or just looking for a Linux distribution to help you migrate from Windows, Fedora 41 might be just the ticket.
better Linux usage stats
http://stats.wikimedia.org/...quidReportOperatingSystems.htm