Setting up a data analytics environment in Linux with Python

Down in the Mine

© Photo by Luca Maffeis on Unsplash

© Photo by Luca Maffeis on Unsplash

Article from Issue 259/2022

The Knowledge Discovery in Data Mining (KDD) method breaks the business of data analytics into easy-to-understand steps. We'll show you how to get started with KDD and Python.

Data analytics is a major force in the current zeitgeist. Analytics are the eyes and ears on a very wide variety of domains (society, climate, health, etc.) to perform an even wider variety of tasks (such as understanding commercial trends, the spread of COVID-19, and finding exoplanets). In this article, I will discuss some fundamentals of data analytics and show how to get started with analytics in Python. Finally, I will show the whole process at work on a simple data analytics problem.

A Primer on Data Analytics

Data analytics uses tools from statistics and computer science (CS), such as artificial intelligence (AI) and machine learning (ML), to extract information from collected data. The collected data is usually very complex and voluminous, and it cannot be interpreted easily (or at all) by humans. Therefore, the data on its own is useless. Information lies hidden within the data, and it takes many forms: repeating patterns, trends, classifications, or even predictive models. You can use this data to uncover insights and build knowledge of the problem you are studying. For example, suppose you wish to measure the traffic in a parking lot that is monitored by a network of IoT sensors covering the whole city. Reading a single occupancy sensor doesn't say anything about the traffic on its own. Neither do the readings of all the parking sensors of the city without any more context. But the timestamped percentage of occupied places within the monitored parking lot does tell us something, and we use this information to derive insights, such as the times of day with maximum traffic.

Learning the mathematical background and analytics tools is only half the journey. Field expertise (experience on the problem that is being studied) is equally important. Some data scientists come from a statistics background, others are computer scientists who pick up the statistics as they go, and many are people starting from a field of expertise who need to learn both the statistics and the computing tools.


Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Data Visualization in Python

    Python's powerful Matplotlib, Bokeh, PyQtGraph, and Pandas libraries lend programmers a helping hand when visualizing complex data and their relationships.

  • JSON Deep Dive

    JSON data format is a standard feature of today's Internet – and a common option for mobile and desktop apps – but many users still regard it as something of a mystery. We'll take a close look at JSON format and some of the free tools you can use for reading and manipulating JSON data.

  • CircuitPython

    The CircuitPython run-time environment runs on almost all microcomputers and microcontrollers, making it perfect for cross-platform programming.

  • Panda3D

    Several free game engines are available for Linux users, but programming with them is often less than intuitive. Panda3D is an easy-to-use engine that is accessible enough for newcomers but still powerful enough for the pros at Disney Studios.

  • PyPy and Nuitka

    PyPy and Nuitka improve the performance of Python on a Raspberry Pi.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95