Traffic analysis tools for websites

Data for Breakfast

© Lead Image © rawpixel, 123RF.com

© Lead Image © rawpixel, 123RF.com

Article from Issue 197/2017
Author(s):

If you are looking for an alternative to Google Analytics for studying website data, you can choose from a few free alternatives. In this article, we look at Piwik, Open Web Analytics, and eAnalytics.

Admins who wanted details of the visitors to their websites in the early years of the Internet had to laboriously read the web server's logs. The first log file analysis applications appeared 20 years ago. Analog [1], Webalizer [2] and AWStats [3], which date from this period, are still occasionally in use (see the "Simple Web Analytics Tools" box).

Simple Web Analytics Tools

Many system administrators are quite happy with the simpler, resource-friendly log evaluations provided by statistics tools.

The oldest open source tools include Analog developed in 1995 and Webalizer first released in 1997. Both applications are still regularly updated today. The tools evaluate the logs several times a day, when run by the admin or a cron job. AWStats is also a simple analysis program. It has generated statistics about web page visits since 2000 and is still under active development. The script, implemented entirely in Perl, uses logfile analysis on web, mail, and FTP servers to produce its reports as HTML pages. Simple bar charts graphically enhance the results.

GoAccess [6] (Figure 1) gives the admin the ability to output and continuously update analyses in real time in a terminal or in a browser. GoAccess can handle virtually any log format used by Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, and others.

Figure 1: GoAccess demo application in the browser.

In 2005, Google launched Google Analytics (GA) [4], a website analysis service that is hugely popular today. Open source tools such as Piwik [5] picked up on this trend towards graphical web analytics, but moved its focus to the customer's own server.

With the help of web analytics, site operators collect and evaluate data on the surfing habits of their visitors. The access data are of interest not only for commercial reasons; the companies behind the sites also often seek to better understand their customers and their interests. The following applies: The closer an operator knows the visitors and their preferences, the better the operator can optimize its offerings to suit the target group.

Good to Know

Site operators are often interested in where the visitors come from, what they are looking for, what items they click on, and how long they remain on the site. It can also be useful to know when they leave the site. Admins want to know what browsers and operating systems visitors to the site use, which files and documents they download and with what bandwidth, and how many visitors subscribe to newsletters or RSS feeds.

Web shop operators are interested in how many visitors add goods to their shopping carts, to then purchase them, or possibly not. If a website hosts advertising for third parties, web analysis is essential, because access figures and similar factors determine the prices for advertisers.

Open Access

The market offers many different web analytics tools. They include around 150 commercial, typically proprietary applications, aimed at larger corporate websites. There are also some free and partly also open source tools. This article looks at Piwik, Open Web Analytics [7], and eAnalytics [8] (Table 1).

Table 1

Three Web Statistics Tools at a Glance

 

Piwik

Open Web Analytics (OWA)

eAnalytics

Platforms

Cross-Platform

Cross-Platform

Debian/Ubuntu

License

GPLv3 and others

GPLv2

AGPLv3

Under development since

2009

2009

2011

Language

PHP

PHP

Java and others

Methods

JavaScript tags, log analysis, tracking pixels

JavaScript tags, log analysis, tracking pixels

eAnalytics tag, tracking pixels

Functions

Visitors (visitors, unique visitors), operating system, browser version, downloads, IP address (pseudonymization capable), geolocation by city, page impressions, referrer, plugins

Visitors (visitors, unique visitors), operating system, downloads, browser version, IP address, geo location by country, page impressions, referrer, heat maps

Visitors (visitors, unique visitors), operating system, downloads, browser version, IP address (pseudonymization capable, can be switched off), geolocation by city, page impressions, referrer, plugins

From a technical point of view, the web analytics tools either prepare web server logfiles, or special tags integrated into the HTML web pages giving the admin statistics and graphics for a quick overview and access to all necessary key indicators. Although the server-based method analyzes the logfiles of the web server, developers of the client-based variant add tracking pixels into the source code of the web page to determine the key indicators.

Although none of the two methods fully represents the actual traffic of a website, the client-based system of counting pixels, combined with the controversial use of cookies, is currently just about winning the accuracy stakes.

Privacy Issues

Because they evaluate cookies and store the visitors' IP addresses, web analytics tools always face a difficult legal situation. For example, Germany's Telemedia Act (TMG) [9] allows you to create user profiles if the user does not object to the purposes of advertising and market research. Such a profile is only allowed to contain an anonymized IP address in addition to the data on the use of the website. IP addresses are typically automatically truncated to this end.

The TMG also requires the service provider to inform the user in a privacy statement on the website of whether, to what extent, and for what purpose it processes the IP address. And, the TMG stipulates that users must have an option to object to the creation of user profiles.

Probably the most controversial and at the same time most successful tool for website traffic analysis pages is the Google Analytics online service, which was launched in 2005, and which 50 percent of all websites employ. It is clearly the top dog. In contrast to the applications covered in this article, the data collected by GA leaves users' computers and heads to the United States, where data protection provisions are not as stringent as in Germany and the rest of Europe.

For example, GA delivers the unabridged IP address to the parent company. Also, the website visitor may not necessarily be informed of the fact that Google is collecting its data. Browser add-ons like Ghostery or NoScript can disable GA [10] to provide protection against unwanted data collection. GA doesn't cost anything up to a traffic volume of 10 million hits a month, but it only delivers certain data following a 24-hour delay. Also, the user has to agree to Google's using the data for its own purposes.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News