Traffic analysis tools for websites
Data for Breakfast
If you are looking for an alternative to Google Analytics for studying website data, you can choose from a few free alternatives. In this article, we look at Piwik, Open Web Analytics, and eAnalytics.
Admins who wanted details of the visitors to their websites in the early years of the Internet had to laboriously read the web server's logs. The first log file analysis applications appeared 20 years ago. Analog [1], Webalizer [2] and AWStats [3], which date from this period, are still occasionally in use (see the "Simple Web Analytics Tools" box).
Simple Web Analytics Tools
Many system administrators are quite happy with the simpler, resource-friendly log evaluations provided by statistics tools.
The oldest open source tools include Analog developed in 1995 and Webalizer first released in 1997. Both applications are still regularly updated today. The tools evaluate the logs several times a day, when run by the admin or a cron job. AWStats is also a simple analysis program. It has generated statistics about web page visits since 2000 and is still under active development. The script, implemented entirely in Perl, uses logfile analysis on web, mail, and FTP servers to produce its reports as HTML pages. Simple bar charts graphically enhance the results.
GoAccess [6] (Figure 1) gives the admin the ability to output and continuously update analyses in real time in a terminal or in a browser. GoAccess can handle virtually any log format used by Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, and others.
In 2005, Google launched Google Analytics (GA) [4], a website analysis service that is hugely popular today. Open source tools such as Piwik [5] picked up on this trend towards graphical web analytics, but moved its focus to the customer's own server.
With the help of web analytics, site operators collect and evaluate data on the surfing habits of their visitors. The access data are of interest not only for commercial reasons; the companies behind the sites also often seek to better understand their customers and their interests. The following applies: The closer an operator knows the visitors and their preferences, the better the operator can optimize its offerings to suit the target group.
Good to Know
Site operators are often interested in where the visitors come from, what they are looking for, what items they click on, and how long they remain on the site. It can also be useful to know when they leave the site. Admins want to know what browsers and operating systems visitors to the site use, which files and documents they download and with what bandwidth, and how many visitors subscribe to newsletters or RSS feeds.
Web shop operators are interested in how many visitors add goods to their shopping carts, to then purchase them, or possibly not. If a website hosts advertising for third parties, web analysis is essential, because access figures and similar factors determine the prices for advertisers.
Open Access
The market offers many different web analytics tools. They include around 150 commercial, typically proprietary applications, aimed at larger corporate websites. There are also some free and partly also open source tools. This article looks at Piwik, Open Web Analytics [7], and eAnalytics [8] (Table 1).
Table 1
Three Web Statistics Tools at a Glance
| Piwik | Open Web Analytics (OWA) | eAnalytics |
---|---|---|---|
Platforms |
Cross-Platform |
Cross-Platform |
Debian/Ubuntu |
License |
GPLv3 and others |
GPLv2 |
AGPLv3 |
Under development since |
2009 |
2009 |
2011 |
Language |
PHP |
PHP |
Java and others |
Methods |
JavaScript tags, log analysis, tracking pixels |
JavaScript tags, log analysis, tracking pixels |
eAnalytics tag, tracking pixels |
Functions |
Visitors (visitors, unique visitors), operating system, browser version, downloads, IP address (pseudonymization capable), geolocation by city, page impressions, referrer, plugins |
Visitors (visitors, unique visitors), operating system, downloads, browser version, IP address, geo location by country, page impressions, referrer, heat maps |
Visitors (visitors, unique visitors), operating system, downloads, browser version, IP address (pseudonymization capable, can be switched off), geolocation by city, page impressions, referrer, plugins |
From a technical point of view, the web analytics tools either prepare web server logfiles, or special tags integrated into the HTML web pages giving the admin statistics and graphics for a quick overview and access to all necessary key indicators. Although the server-based method analyzes the logfiles of the web server, developers of the client-based variant add tracking pixels into the source code of the web page to determine the key indicators.
Although none of the two methods fully represents the actual traffic of a website, the client-based system of counting pixels, combined with the controversial use of cookies, is currently just about winning the accuracy stakes.
Privacy Issues
Because they evaluate cookies and store the visitors' IP addresses, web analytics tools always face a difficult legal situation. For example, Germany's Telemedia Act (TMG) [9] allows you to create user profiles if the user does not object to the purposes of advertising and market research. Such a profile is only allowed to contain an anonymized IP address in addition to the data on the use of the website. IP addresses are typically automatically truncated to this end.
The TMG also requires the service provider to inform the user in a privacy statement on the website of whether, to what extent, and for what purpose it processes the IP address. And, the TMG stipulates that users must have an option to object to the creation of user profiles.
Probably the most controversial and at the same time most successful tool for website traffic analysis pages is the Google Analytics online service, which was launched in 2005, and which 50 percent of all websites employ. It is clearly the top dog. In contrast to the applications covered in this article, the data collected by GA leaves users' computers and heads to the United States, where data protection provisions are not as stringent as in Germany and the rest of Europe.
For example, GA delivers the unabridged IP address to the parent company. Also, the website visitor may not necessarily be informed of the fact that Google is collecting its data. Browser add-ons like Ghostery or NoScript can disable GA [10] to provide protection against unwanted data collection. GA doesn't cost anything up to a traffic volume of 10 million hits a month, but it only delivers certain data following a 24-hour delay. Also, the user has to agree to Google's using the data for its own purposes.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.