AI learning with decision trees
Horses and Riders

© Lead Image © Outdoorsman, fotolia.com
Decision trees help machines emulate human behavior. Thanks to artificial intelligence, programs can autonomously build decision trees by learning from existing real-life data. Can an AI system determine who was operating a particular vehicle by analyzing driving style?
In the 1950s, TV stations were broadcasting howlers that probably wouldn't interest anyone today. One was "What's My Line?" with presenter John Daly, in which a blindfolded team of four panelists had to guess the identity of a star guest using only yes or no questions.
A popular strategy was to pose general questions first ("Are you male?"), in order to ask more specific questions in the final round, until the panelists began to close in on the guest star and their identity was finally revealed.
In machine learning, programmers employ similar techniques to teach computers to imitate learned behaviors. As a somewhat contrived example, take the behavior of an AND gate (Table 1), which always shows the 0
value at output until a 1
is applied to both inputs. The gate is normally implemented by binary operators, but for the purpose of this investigation, I'll be using the decision tree in Figure 1.
Table 1
x/y AND Connection
Input |
| Output |
---|---|---|
x |
y |
x AND y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
The algorithm starts at the head node and checks whether the X
input variable is set to 1
. If this is not the case, further testing is unnecessary, and the result of 0
is final. If X
is 1
, on the other hand, the algorithm proceeds to the lower left to the Y==1?
node. After evaluation, the end result is clear: If Y
is also equal to 1
, the AND gate displays 1
, but if Y
is 0
, the gate value is 0
.
Supervised Learning
Now, of course, artificial intelligence (AI) can capture and simulate far more complex relationships than simple AND gates. However, the decision trees used there follow exactly the same procedure as in our simple example. In supervised learning, they receive input values such as X
and Y
and the expected output values, like the output values of the AND gate. From this, decision trees are automatically being built internally and gradually honed until, in a fully trained state in production, the system generates exact results from new live input values – or at least values close to the ideal result.
AI kits [1] like the Python module scikit-learn
help automatically create these decision trees, which are based on live input data; it is easily installed using
pip3 install scikit-learn scipy
preferably in a virtualenv
environment, to avoid clutter on your computer:
virtualenv -p /usr/local/bin/python3 dt source dt/bin/activate pip3 install [...]
Listing 1 [2] additionally imports the pydotplus
module to print the decision tree for study purposes.
Listing 1
decision-tree.py
Mandatory Algorithm
The X
Python list contains the previously known input values of the gate as a series of x/y values, also in list form. The Y
variable contains the corresponding output values in the same order. The tree
class from the scikit-learn
module (the abbreviated name sklearn
is also valid) now offers the DecisionTreeClassifier
, which approximates relationships by fitting with the fit()
method; internally, it builds a tree and later predicts the results for new input values using predict()
.
To take a look at the internal processes in the production line of this procedure, the export_graphviz()
method collects the generated tree and outputs it in text notation for the graphviz
chart tool. The pydotplus
Python module (installed with pip3
) creates the PNG file in Figure 2, which requires the graphviz
package. You can install the graphviz
package on Ubuntu by typing:
sudo apt-get install graphviz
The output from the script in Figure 3 shows that, after the fitting phase, perfect results are obtained from all possible input values.
At first glance, the machine-generated decision tree in Figure 2 looks a bit unorthodox, but machines think differently from humans; they just rigidly follow their algorithms. For example, the classifier first checks whether X
is less than 0.5, and if that is the case, it outputs class=0
as a result in the True node below left. If X
is greater than 0.5, however, the False node below right checks to see whether Y
is less than 0.5 and displays 0
if yes and 1
if no; that is, it's equal to 1
in real life.
The procedure performs as optimally as the tree in Figure 1 but – typical of a machine – somewhat non-humanly, just like the AlphaGo mastermind [3], which now ranks among the world's top Go board players, with some seemingly absurd moves that no one has ever tried before.
From Insecure to Secure
The method used by scikit-learn
[4] is mathematically proven and builds the tree by splitting the total of all possible input tuples and their preexisting results into subsets. Each division aims to minimize uncertainty in the system – its entropy. At the head node, the entropy is still high because nothing is yet known about the result, so the algorithm would have to select either 0
or 1
as a result. Because the output is 0
in 75 percent of all cases, and 1
in 25 percent of cases (Table 1), the entropy of the system is initially:
-[(1/4) x ln(1/4)]-[(3/4) x ln(3/4)] = 0.56
If the first node separates the possible input values into two subsets by the X<=0.5
comparison,
[[0,0],[0,1]]
and
[[1,0],[1,1]]
then the entropy of the first subset is equal to 0 (the result is 0 in any case; that is, it's certain), and the second stays 0.56, so the total entropy is:
-(1/2 x 0)+(1/2 x 0.56) = 0.28
With only one node in the tree, the system's entropy has decreased from 0.56 to 0.28. An additional node splits the remaining two result tuples into two individual results. In the case of X>0.5
where Y<=0.5
, the entropy of the whole system drops to 0. Therefore, with two nodes, the tree can correctly predict the result for each arbitrary input tuple with 100 percent probability.
In this case, it is of academic interest at most, because an AND gate can be set up in hardware or firmware much more efficiently. However, an algorithm that knows nothing of the internal structure at first and only learns from combinations of input and output values, mimicking the behavior of the unknown system, can be used to tackle arbitrarily complex problems. As a bonus, it even supports non-permitted input values such as [0.9,0.8]
, which it transforms into useful results (in this case, 1
).
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
News
-
KaOS 2022.06 Now Available With KDE Plasma 5.25
The newest iteration of KaOS Linux not only adds the latest KDE Plasma desktop but sets LibreOffice as the default.
-
Manjaro 21.3.0 Is Now Available
Manjaro “Ruah” has been released and includes the latest Calamares installer, GNOME 42, and much more.
-
SpiralLinux is a New Linux Distribution Focused on Simplicity
A new Linux distribution, from the creator of GeckoLinux, is a Debian-based operating system with a focus on simplicity and ease of use.
-
HP Dev One Linux Laptop is Now Available for Pre-Order
The System76/HP collaboration Dev One laptop, geared toward developers, is now available for pre-order.
-
NixOS 22.5 Is Now Available
The latest release of NixOS with a much-improved package manager and a user-friendly graphical installer.
-
System76 Teams up with HP to Create the Dev One Laptop
HP and System76 have come together to develop a new laptop, powered by Pop!_OS and aimed toward developers.
-
Titan Linux is a New KDE Linux Based on Debian Stable
Titan Linux is a new Debian-based Linux distribution that features the KDE Plasma desktop with a focus on usability and performance.
-
Danielle Foré Has an Update for elementary OS 7
Now that Ubuntu 22.04 has been released, the team behind elementary OS is preparing for the upcoming 7.0 release.
-
Linux New Media Launches Open Source JobHub
New job website focuses on connecting technical and non-technical professionals with organizations in open source.
-
Ubuntu Cinnamon 22.04 Now Available
Ubuntu Cinnamon 22.04 has been released with all the additions from upstream as well as other features and improvements.