11.3 PERCEPTRONS AND TWO-VALUED ALGEBRA
The purpose of the exercises in this chapter is to provide an introduction to training perceptrons using the Rosenblatt program. This will be accomplished by training a perceptron to
compute the truth table for the four logic gates that was given above. The perceptron that will be
trained will have two input units, one for the variable A, the other for B. If a variable is true, then
its input unit will be activated with a value of 1. If a variable is false, then its input unit will be activated with a value of 0. There will be four different training patterns that will be presented to the
perceptron, one for each possible combination of the values of A and B (i.e., one for each row in
the table above).
The perceptron that will be trained will have four different output units. The output unit
will be trained to generate an activation of 1 when its logical combination of A and B is true, and
to generate an activation of 0 when its logical combination of A and B is false, as indicated in the
table above. The first output unit will be trained to compute AND, the second to compute OR, the
third to compute ~A (i.e., to invert A), and the fourth to compute ~B.
In our previous exercises that used the James program, when network error was used as
a criterion to stop training, we did this by having training stop when the total sum of squared error
for the network dropped below some minimum value. In the Rosenblatt program, a different approach is used. In the spirit of two-valued logic, when one pattern is presented to the network,
each output unit is either going to be right or it is going to be wrong. When an output unit is right,
we will count it as a “hit”. When an output unit is wrong, we will count it as a “miss”. The maximum number of hits that are possible for a training set is equal to the total number of training patterns times the total number of output units. For example, the network that we will be training
below has four different training patterns, and four different output units. So, we will want training
to stop when there are 16 hits (and 0 misses).
How is a hit defined? For each pattern presented during a training epoch, we will compute the squared difference between the desired activity of an output unit and its actual activity.
When this squared difference is smaller than a minimum value that will be set before training, we
will count a hit. Otherwise, we will count a miss. The default setting for this minimum value is
0.01. What this means is that if the desired output activity for a unit is 1, if it generates a value
higher than 0.9 then it will count as a hit. This is because (1 – 0.9)
2
= 0.1
2
= 0.01, which is our
minimum value. Using similar logic, if the desired output activity for a unit is 0, if it generates a
value smaller than 0.1 then it will count as a hit. Otherwise, it will count as a miss. If for some
reason we want to increase the accuracy of what a network learns, we can do this by decreasing
this hit criterion. For instance, by setting it to 0.0025, the network will have to generate activity
higher than 0.95 to turn “on”, and lower than 0.05 to turn “off”. If for some reason we want a more
liberal definition of a hit, then we can do this by increasing this criterion value. 11.3.1 PROCEDURE FOR THE DELTA RULE
Install and run the Rosenblatt program, referring back to Chapter 10 if necessary. Load
the file “Boole4.net”. On the setup page, choose the Delta rule, and keep the remaining settings
at their default values:
• End after a maximum number of training epochs
• End when there are all “hits” and no “misses”
• Randomize patterns each epoch
• Train thresholds during learning
• Default starts for weights
• Default starts for thresholds
• Maximum number of epochs = 1000
• Number of epochs between printouts = 100
• Learning rate = 0.5
• Minimum level of squared error to define a “hit” = 0.01
Press the “Start Training” button to begin training. When the network converges to a solution to the problem, press the “Test Recall” button. Then, have the program build an Excel
spreadsheet to summarize the results. This spreadsheet will contain all of the information required to answer the following questions. If you are using a version of the Rosenblatt program
that does not use Excel, then save the results of training to a file that you can examine later to
answer the questions.
11.3.2 EXERCISE 11.1
1. What is the total SSE for the network after training has finished?
2. How many epochs of training occurred before the program stopped training the
network?
3. When the Delta rule is used in the Rosenblatt program, the step activation function
is being used in the output units. The unit will only turn on when its net input exceeds the unit’s threshold or bias. Armed with this knowledge, look at the two
connection weights that feed into the first output unit, and look at the bias of this
output unit. Explain how this output unit computes the AND of A and B.
4. Look at the two connection weights that feed into the second output unit, and look
at the bias of this output unit. Explain how this output unit computes the OR of A
and B.
5. Look at the two connection weights that feed into the third output unit, and look at
the bias of this output unit. Explain how this output unit INVERTS the signal from
A.
6. Look at the two connection weights that feed into the fourth output unit, and look
at the bias of this output unit. Explain how this output unit INVERTS the signal
from B.
11.3.3 PROCEDURE FOR THE GRADIENT DESCENT RULE
Run the Rosenblatt program once again on the file “Boole4.net”. On the setup page,
choose the gradient descent rule, and keep the remaining settings at their default values, which
should be the same as in the previous exercise:
• End after a maximum number of training epochs
• End when there are all “hits” and no “misses”
• Randomize patterns each epoch
• Train thresholds during learning
• Default starts for weights • Default starts for thresholds
• Maximum number of epochs = 1000
• Number of epochs between printouts = 100
• Learning rate = 0.5
• Minimum level of squared error to define a “hit” = 0.01
Press the “Start Training” button to begin training. If the network has not converged to a
solution after 1000 epochs, press the “Continue Training” button. When the network converges to
a solution to the problem, press the “Test Recall” button. Then, have the program build an Excel
spreadsheet to summarize the results. This spreadsheet will contain all of the information required to answer the following questions. If you are using a version of the Rosenblatt program
that does not use Excel, then save the results of training to a file that you can examine later to
answer the questions.
11.3.4 EXERCISE 11.2
1. What is the total SSE for the network after training has finished?
2. How many epochs of training occurred before the program stopped training the
network?
3. How do your answers to questions 1 and 2 above compare to your answers to
questions 1 and 2 in Exercise 11.1? If the answers are different, provide a brief explanation of why this is to be expected.
4. When the gradient descent rule is used in the Rosenblatt program, the logistic
activation function is being used in the output units. Armed with this knowledge,
look at the two connection weights that feed into the first output unit, and look at
the bias of this output unit. Explain how this output unit computes the AND of A
and B. How does this explanation compare to the explanation of AND that you
provided for the perceptron that was trained with the Delta rule?
5. Look at the two connection weights that feed into the second output unit, and look
at the bias of this output unit. Explain how this output unit computes the OR of A
and B. How does this explanation compare to the explanation of OR that you provided for the perceptron that was trained with the Delta rule?
6. Look at the two connection weights that feed into the third output unit, and look at
the bias of this output unit. Explain how this output unit INVERTS the signal from
A. How does this explanation compare to the explanation of INVERT that you provided for the perceptron that was trained with the Delta rule?
11.3.5 PROCEDURE FOR EXPLORING BIAS
Run the Rosenblatt program once again on the file “Boole4.net”. On the setup page,
choose the delta rule, and set the option to hold thresholds constant during training. Keep the
remaining settings at their default values, which should be the same as in the previous exercise,
with the exception of holding thresholds constant:
• End after a maximum number of training epochs
• End when there are all “hits” and no “misses”
• Randomize patterns each epoch
• Hold thresholds constant during learning
• Default starts for weights
• Default starts for thresholds
• Maximum number of epochs = 1000
• Number of epochs between printouts = 100
• Learning rate = 0.5
• Minimum level of squared error to define a “hit” = 0.01 Press the “Start Training” button to begin training. If the network has not converged to a solution after 1000 epochs, press the “Continue Training” button. If the network has not solved
the problem after 3000 sweeps, then do not train any further. You should observe that the
network generates only 13 hits, and 3 misses, after this amount of training, and that network
performance will not improve. Have the program build an Excel spreadsheet to summarize
the results. This spreadsheet will contain all of the information required to answer the following questions. If you are using a version of the Rosenblatt program that does not use Excel,
then save the results of training to a file that you can examine later to answer the questions.
11.3.5 EXERCISE 11.3
1. What is the total SSE for the network after training has finished?
2. How many epochs of training occurred before the program stopped training the
network?
3. Examine the responses of the network to each pattern, and the errors computed
for each output unit for each pattern. In what way is the network behaving correctly? In what way is the network making mistakes?
4. With the default settings, and with thresholds held constant during training, every
output unit always has a threshold of 0. Armed with this knowledge, examine the
connection weights that feed into any output unit that is generating errors. Explain
why any errors are being made.
5. Given your answer to question 4, speculate on the role of the threshold in the perceptron, and speculate on why it might be important to let the learning rule train
thresholds in addition to training the connection weights.
0 comments:
Post a Comment