Artificial intelligence detects mileage patterns
Staying Normal
The regression only works if the training data was previously normalized for a constrained value range. If the script feeds the optimizer with the unmodified Unix seconds as the mileage date, the algorithm goes haywire and produces increasingly nonsensical values, until it finally breaks the boundaries of the hardware's floating-point math and sets all parameters to nan
(Not a Number).
Lines 31 to 37 in Listing 1 therefore normalize the training data by using pandas' min()
and max()
methods to find the minimum and the maximum timestamps, then subtract the minimum from all training values as an offset, and finally subdivide by the min-max difference.
This process normally results in training values between 0
and 1
(but caution if min
= max
), which the optimizer can process more efficiently.
With the learned parameters, it is now possible to reproduce historical values within the model's framework or predict the future. What mileage will the car have on June 2, 2019? The date has an epoch
value of 1559516400
, which the model has to normalize just as in the training case. The offset of 1486972800
, found as norm_off
in Figure 3, gets subtracted, and the input date is also divided by the scaling factor norm_mult
of 7686000
.
This results in an X
value of 9.43
, which is substituted into the formula
Y = X * W + b
to predict a mileage of 94,115 for June 2, 2019 – all assuming, of course, that the model is accurate (i.e., that the increase is indeed linear) and that the three months of training data are sufficient to determine the slope of the curve more or less accurately.
Keeping Back Data
To ensure that the model not only simulates the training data but also predicts the real future, AI specialists often break down the available data into a training and a test set. They train the model only with data from the training set; otherwise, the risk is that it will mimic the training data perfectly, including replicating any temporary outliers that do not occur later in production, causing the system to predict artifacts that are out of touch with reality.
If the test set remains untouched up to the end of the training runs and the model later also correctly predicts the test data, the AI system will most likely behave as expected later in a production environment.
Now, my 30-year-old HP-41CV pocket calculator was already able to determine the parameters W
and b
from a collection of X
/Y
values by assuming a linear relationship with a linear regression. However, TensorFlow can now do much more, because it also understands neural networks and decision trees, as well as more complex regression techniques.
No Simple Pattern
If you look at the daily mileage numbers closely, you will note that the increase is by no means precisely linear over time. Figure 4 shows the higher resolution mileage growth per day and illustrates that the rise is subject to huge fluctuations. For example, the car travels between 16 and 50 miles on most days, interrupted by a pause of two consecutive days every so often, with no increase in mileage at all.
A person simply looking at the graph in Figure 4 will immediately see that the car is driven less on weekends than on workdays. For an AI system to offer the same kind of intuitive performance, the programmer needs to take it by the hand and guide it in the right direction.
If the dates are, for example, stated in epoch seconds, as is common on Unix, the AI system will never in its lifetime find out that the weekend happens every seven days, with less driving as a result. A linear regression would only stretch the last few data points into the future; a polynomial regression would produce completely insane patterns in a mad bout of overfitting.
The learning algorithms are also bad at handling incomplete data. If there are no measured values for certain X
values, for example, on days when the car was only parked in the garage, the conscientious teacher needs to fill them with meaningful values (e.g., with zeros). Also, you need to add what is known as "expert knowledge" in the discipline of machine learning: Because the weekday of the date values is known and will hopefully help the algorithm, a new CSV file (miles-per-day-wday.csv
) simply provides the sequence number of the weekday (neural networks do not like strings, only numbers) for the daily mileage reading (Figure 5).
Listing 2 then uses the sklearn
framework to construct a neural network that it teaches to guess the associated day of the week based on the mileage. To do so, it first reads the CSV file and forms the data frame X
with the mileage numbers from it, and with y
as a vector containing the associated weekday numbers.
Listing 2
neuro.py
The train_test_split()
function splits the existing data into a training set and a test set, which the standard scaler
normalizes in lines 19 to 22 because neural networks are extremely meticulous as far as the value range of the input values is concerned.
The multilayer perceptron of type MLPClassifier
generated in lines 24 and 25 creates a neural network with two layers and stipulates that the training phase will be running for 1,000 steps at the most. Calling the fit()
method then triggers the teach-in, during which the optimizer tries to adjust the internal receptor weights in a bout of supervised learning, to evaluate the input until the error is minimized between the predicted value calculated from the training parameters and the anticipated value in y_train
.
The results were not all that exciting in the experiment, in part because the predicted values varied greatly from call to call, and the precision left something to be desired; yet, the neural network predicted the weekday from a given mileage in most cases. A variety of different input parameters would lead to better results.
With TensorFlow and SciKits, curious users have two sophisticated frameworks for experimentation with AI applications at their disposal. Getting started is anything but child's play because the literature [3] [4] on the latest features is still fairly recent and not very mature; also, a number of works are still in the development stage. However, it is worth exploring the matter, because this area of computer science undoubtedly has a bright future ahead of it.
Infos
- "Programming Snapshot – Driving Data" by Mike Schilli, Linux Pro Magazine, issue 202, September 2017, p. 50, http://www.linuxpromagazine.com/Issues/2017/202/Programming-Snapshot-Driving-Data
- Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/linux-magazine.com/<issue no.>/
- Guido, Sarah, and Andreas C. Müller. Introduction to Machine Learning with Python. O'Reilly Media, 2016
- Géron, Aurélien. Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly Media, 2017
« Previous 1 2
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
Kubuntu Focus Announces XE Gen 2 Linux Laptop
Another Kubuntu-based laptop has arrived to be your next ultra-portable powerhouse with a Linux heart.
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.
-
openSUSE Leap 15.5 Beta Now Available
The final version of the Leap 15 series of openSUSE is available for beta testing and offers only new software versions.
-
Linux Kernel 6.2 Released with New Hardware Support
Find out what's new in the most recent release from Linus Torvalds and the Linux kernel team.