Living with Statistics
Off the Beat: Bruce Byfield's Blog
One of the prices of software freedom is the impossibility of getting accurate figures for usage. As a user, I consider that a small price to pay for not having to register or activate software. However, as a journalist I'm often frustrated, because accurate figures can be useful for establishing a point or debunking rumors.
The questions for which I would like accurate stats include: how many GNU/Linux users are there? Has Linux Mint really overtaken Ubuntu as the most popular distribution? Has GNOME gained or lost users with the start of its third release series? All these questions and more would benefit from reliable figures, yet we don't have any. Instead, we have a series of indicators that are approximate at best, and completely unreliable at worst.
One problem is external biases. For example, when NetApplications places Linux usage at 1.6%, that total is derived "from the browsers of site visitors to our exclusive on-demand network of live stats customers." But when I consider that the same methodology based on visits to my personal blog would suggest a figure of 19% for Linux, I have to wonder if NetApplications' figures aren't as skewed as mine, but in the opposite direction.
Similarly, since NetApplications' headquarters are in California, probably American companies are most likely to use its services. Unofficially, I am always told that free software usage is lightest in North America, Microsoft's home, and higher in Europe or in developing countries.
However, other problems arise when I rely on sources that are more friendly to free software, such as Distrowatch's page views for distributions. My guess is that most people who visit Distrowatch are already familiar with free and open source software (FOSS), so that their figures reflect only reflect the tastes of relatively experienced users.
Yet even that assumption may be questionable. Page views might tell what distributions people are curious about, but that might be a rough indicator of what people are downloading and using.
Moreover, Distrowatch's numbers are small enough that a new release or a lively discussion elsewhere online can skew results for days or weeks at a time. A handful of fans might easily distort results, although nothing indicates that such an effort has ever been made. Armed with such doubts, you can easily dismiss Distrowatch figures altogether, as Canonical employee Michael Hall did when Distrowatch reported Linux Mint as receiving more views than Ubuntu.
User surveys share some of the problems of Distrowatch's figures, but also come with their own problems. For instance, FLOSSPOLS' survey of gender in the community frames all discussions of women's under-representation in FOSS. Yet the FLOSSPOLS data was collected seven to eight years ago, making it decidely obsolete, especially in a field that changes as rapidly as FOSS. Today, we have no idea whether the situation in the community is better than the survey reports (it could hardly be worse).
Still, at least the FLOSSPOLS survey was designed according to research standards. Community surveys, such as the Linux Journal's Readers' Choice Awards or the LinuxQuestions' Members Choice Awards can't even claim that. In both, participants are self-selected and answers are open ended. The number of participants may or may not be given, and margins of errors never -- although, if they were, they might be as high as five percent. If so, then in many cases where GNOME was declared the most popular desktop environment over KDE, or Mozilla the most popular web browser over Chrome, a more accurate result would probably be to declare a tie.
None of what I am saying is meant to be a reflection upon those who collect the data. With the exception of FLOSSPOLS and NetApplications, none of these sources has ever claimed to be providing scientifically reliable information. In some cases, entertainment is probably more of a motivation than anything else.
But for those of us in search of accurate information, the shortcomings of what is available are annoying, to say the least.
Living with Imperfection
So what's a writer to do? The high road would be to ignore such sources of information, and learn to live with uncertainty. As much as I want accurate information about FOSS, I might have to accept that it just doesn't exist.
However, that is hardly a solution. Even if I ignore these figures, others don't. Such sources as are available always being cited to support various arguments, and, if nothing else, I might want to debunk the argument with something more than the reasonable doubt of meta-arguments.
Besides, the issues that such sources touch upon are ones that I -- and many other people -- want to talk about. As limited as these information sources maybe, they at least give some context to discussions that would otherwise be even less uninformed.
As a result, the way I use these figures is an uneasy compromise. However, briefly, I try to indicate that they're not reliable. I try not to make arguments that depend on a couple of percentage points of difference.
Most of all, I try not to base an argument on any single set of results. If a survey gets the same results several years running, I'm more likely to trust the figures than if they appear in a single year. Better yet are times when more than one source shows similar results over several years.
Of course, if I was paranoid enough, I might worry about whether all surveys were being manipulated by a small group of users or corporate employees. Realistically, though, I think that, under the conditions I describe these statistical sources can indicate general trends to a degree that no other sources of information can. But I try not to forget that these sources are tentative, and can never be used with any precision.
Comments
comments powered by DisqusSubscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.
better Linux usage stats
http://stats.wikimedia.org/...quidReportOperatingSystems.htm