Car NOT Goat

Programming Snapshot – Neural Networks

© Lead Image © Frantisek Hojdysz, 123RF.com

© Lead Image © Frantisek Hojdysz, 123RF.com

Author(s):

The well-known Monty Hall game show problem can be a rewarding maiden voyage for prospective statisticians. But is it possible to teach a neural network to choose between goats and cars with a few practice sessions?

Here's the problem: In a game show, a candidate has to choose from three closed doors; waiting behind these doors is a car, which is the main prize, a goat, and yet another goat (Figure 1). The candidate first picks a door, and then the presenter opens another, behind which there is a bleating goat. How is the candidate most likely to win the grand prize: Sticking with their choice or switching to the remaining closed door?

Figure 1: Monty Hall problem on Wikipedia.

As has been shown [1] [2], the candidate is better off switching, because then they double their chances of winning. But how does a neural network learn the optimal game strategy while being rewarded for wins and punished for losses?

Human Model

The input and output data must be professionally manipulated – as is always the case with machine learning. An artificial intelligence (AI) system is not a cauldron into which you throw problems and then ready-made solutions just bubble up. In fact, AI algorithms only solve a small number of precisely defined problems.

To solve the problem, the algorithm used in this article is a multilayer neural network that takes three input parameters: the door the candidate selected, the door the presenter opened, and the remaining locked door.

In the perceptron's hidden middle layer, each artificial neuron is connected to each input neuron of the first layer. Even though a neural network does not quite work like a human brain, you can still interpret these massive links as a reflection of the human design. In turn, each of the hidden layer's inner neurons fires pulses to all the output layer's neurons.

Carrot and Stick

In the training phase, we want the network to learn a strategy from played game shows to predict those that indicated winning the prize on the output neuron, depending on the current door constellation. In a few thousand rounds, the script feeds the three input parameters into the AI system and compares the door value at the output with the actual door leading to the car. If the system has predicted the correct door, it is rewarded. If it is wrong, it has to adjust its neurons' parameters via a feedback mechanism.

In AI jargon, the training runs are called episodes. It is often helpful not to adjust the neuron parameters with every dataset, but instead just after a batch of input values. This saves computing time and prevents the system from balancing the weights in wild swings, which often leads to unstable conditions that don't converge into a solution.

1,000 Shows Recorded

Listing 1 records the results of 1,000 game shows in which the prize is placed behind a random door, and then the presenter opens a door that neither leads to the main prize nor is already open. Game results go to a file in CSV format (Figure 2). It numbers the doors from   to 2 and logs the indices of the following doors in each line in this order: the door that is chosen by the game show candidate, the door that the presenter opens, the remaining door, and the one leading to the prize.

Listing 1

monty

 

Figure 2: A random generator generates results of game shows and outputs them in CSV format for a subsequent training session with a neural network.

For example, if the neural network encounters the [1,2,0,0] combination, such as in the first line in the file displayed in Figure 2, it knows that the candidate has chosen the second door (index 1), the presenter has then opened the third (index 2), and the first is still closed (index 0). The main prize was randomly hidden and ended up behind the first door (index 0). With these parameters, to win the game, the neural network must pick the first door.

Listing 1 defines two classes, Door for individual doors and Show for a world with three doors and the rules of the TV show. Door objects are initialized either with or without a main prize; line 15 places the prize behind the first door and then lets line 16 shuffle the doors, so that the prize randomly ends up somewhere. Using the pick() method from line 22, randrange() simulates the candidate picking a random door. The Show object remembers the selected door's index in the picked instance variable.

Presenter in a Bind

The presenter then has to open another door in the for loop starting at line 28, but must not reveal the main prize. In the revealed attribute, the object stores this door's index. The remaining third door's index is then saved in the alternate attribute. The for loop starting in line 50 iterates over 1,000 game shows, and the print() statement on line 55 outputs their results in CSV format. This is, line by line for each show, the indices of the candidate door, the presenter door, the remaining door, and the winning door.

One-Hot Encoding

If the AI apprentice employs a neural network and feeds in individual shows as 3-tuples, each paired with a one-part result tuple in the training phase, it won't produce satisfying results because door indices aren't really relevant as numerical values; instead, they stand for categories, each door representing a different category. The AI expert transforms such datasets before the training run into categories using one-hot encoding. If a dataset provides values for n categories, the one-hot encoder shapes individual records as n-tuples, each of which has one element set to 1, with the remaining elements set to  .

Figure 3 shows an example of how an input series like [2,1,2,0,1,0] is converted into six one-hot-encoded matrix rows. The code in Listing 2 uses the to_categorical() function from the np_utils module of the keras.utils package to accomplish this. To return from one-hot encoding back to the original value later, use the argmax() method provided by numpy arrays.

Listing 2

onehot

 

Figure 3: One-hot converts values into categories that set one value per tuple to 1.

Machine Learning

Armed with the input values in one-hot format, the three-layer neural network method defined in Listing 3 can now be fed with learning data. Important: The network also encodes the output values according to the one-hot method and therefore not only needs a single neuron on its output, but three of them, because both the training and, later on, the predicted values are available as 3-tuples, each of them indicating the winning door as a 1 in a sea of zeros.

Listing 3

learn

 

The saved training data generated by Listing 1 in shows.csv is then read by Listing 3 in line 7. The first three elements of each line are the input data of the network (candidate door, presenter door, alternative door), and the last item indicates the index of the door to the main prize.

Line 12 transforms the desired output values into categories in one-hot encoding; lines 14 to 18 build the neural network with an entry layer, a hidden layer, and an output layer. All layers are of the Dense type; thus, they are networked in a brain-like style connecting with all elements of adjacent layers. The Sequential class of the Keras package holds the layers together. Line 20 compiles the neural network model; Listing 3 specifies the error function as binary_crossentropy as the learning parameter and selects the adam algorithm as the optimizer, which specializes in categorization problems.

In the three-layer model, 10 neurons receive the input data in the input layer and input_dim=3 sets the data width to 3, since it consists of 3-tuples (values for three doors). The middle layer has three neurons, and the output layer also has three. The latter is, as mentioned above, the one-hot encoding of the results as categories.

Acid Test

The training phase starts in line 22 by calling model.fit(). It defines 100 iterations (epochs) and a batch size of 100, which defines that only after 100 training values should the neural network adjust its inner weights with the collected information. From line 25, the script visualizes whether the training was successful or not: For all possible door combinations, line 30 calls the predict() method to pick a door according to what the network has learned so far. Lo and behold, Figure 4 actually shows that the computer selects the alternative door each time, thus letting the candidate switch to increase their chances of winning in the most optimal way, as the mathematical proof also shows.

Figure 4: The network has learned that the alternative door offers the most lucrative chance of winning.

This is remarkable, because the network does not know the mathematical correlations, but instead only learns from empirically obtained data. The input data are even somewhat ambiguous, because switching only leads to success in two thirds of all cases. If you forge the input data and hide the prize behind the alternate door every time, the network's internal success metrics rise all the way to 100 percent, and then the network is absolutely sure.

Even if we feed real-world data into the network and the candidate loses in a third of all cases using the switch method, the network optimizes its approach and ends up switching most of the time, with the occasional outlier. Also, if you vary the parameters of the network, for example, the number of epochs or number of neurons per layer, results might vary. As always with these kinds of problems, it's just as much art as science with lots of wiggle room to train an artificially intelligent system successfully.

Infos

  1. "Calculating Probability" by Michael Schilli, Linux Magazine, issue 165, August 2014, pg. 60,http://www.linux-magazine.com/Issues/2014/165/Calculating-Probability
  2. Monty Hall problem: https://en.wikipedia.org/wiki/Monty_Hall_problem
  3. Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/207/

The Author

Mike Schilli works as a software engineer in the San Francisco Bay area, California. Each month in his column, which has been running since 1997, he researches practical applications of various programming languages. If you go to mailto:mschilli@perlmeister.com he will gladly answer any questions.