Tuesday, November 15, 2011

An Overview of Neural Networks


I. INTRODUCTORY REMARKS

This essay was originally part of the series "The Anatomical Basis of Mind", available on this website as The Anatomical Basis of Mind (Neurophysiology & Consciousness). The purpose of that series was to learn & explain what is known about how neurophysiology results in the phenomena of mind&self. Further, to understand which brain/neuronal structures must be preserved to preserve mind&self. And still further, to be able to use that knowledge to suggest & evaluate cryonics and other preservation methods.
This installment addresses the subject of computer-models of neural networks and the relevance of those models to the functioning brain. The computer field of Artificial Intelligence is a vast bottomless pit which would lead this series too far from biological reality -- and too far into speculation -- to be included. Neural network theory will be the singular exception because the model is so persuasive and so important that it cannot be ignored.
Neurobiology provides a great deal of information about the physiology of individual neurons as well as about the function of nuclei and other gross neuroanatomical structures. But understanding the behavior of networks of neurons is exceedingly challenging for neurophysiology, given current methods. Nonetheless, network behavior is important, especially in light of evidence for so-called "emergent properties", ie, properties of networks that are not obvious from an understanding of neuron physiology. Although neural networks as they are implemented on computers were inspired by the function of biological neurons, many of the designs have become far removed from biological reality. Moreover, many of the designers have lost all interest in simulating neurophysiology -- they are more interested in using their new tools to solve problems. The theory of computation of artificial neural networks can be highly mathematical, with some networks existing entirely as mathematical models. My exposition will attempt to minimize mathematics, using only verbal descriptions and some simple arithmetic. 

II. MODEL NEURONS: NEURODES

The building-block of computer-model neural networks is a processing unit called a neurode, which captures many essential features of biological neurons.
[NEURODES REPRESENT AN AND GATE]In the diagram, three neurodes are shown, which can perform the logical operation "AND", ie, the output neurode will fire only if the two input neurodes are both firing. The output neurode has a "threshold" value (T) of 3/2 (ie, 1.5). If neither or only one input neurode is firing, the total input to the output neurode will be less than 1.5, and the output neurode will not fire. However, if both input neurodes are firing, the total input of 1+1=2 will be greater than the threshold value of 1.5, and the output neurode will fire. Similarly, an "OR" operation can be implemented using the same architecture, but changing the threshold value to 0.5. In this case, the output neurode fires only if it receives input from either or both neurodes.
[NEURODES REPRESENT AN OR GATE]The values in parenthesis (1) on the connections between the neurodes are weights of the connections, corresponding to the synaptic strength of neuron connections. In biological neural networks the firing of a neuron can result in varying amounts of neurotransmitter released at the synapses of that neuron. Imagine, for example, a neuron with 3 axons leading to 3 presynaptic terminals. One terminal releases neurotransmitter from 20 vesicles, another from 100 vesicles and the third from 900 vesicles. The synaptic strength (the weight) of the second terminal is 5 times as great as the first, everything else being equal. In the neurodes of computer models, weights tend to be values between -1 and +1. Notice that in the examples shown, the weights could have been (0.8) rather than (1) and the results would be the same.
Now consider a more complex network, one designed to do the logical operation "EXCLUSIVE-OR" (XOR). The threshold values are shown inside the neurode circles and the weights are shown alongside the connections. Note the addition of a neurode (the hidden neurode) between the input and output neurodes.
[NETWORK WITH HIDDEN NEURODE]In an XOR operation, the output neurode only fires if one (but not both) of the input neurodes fire. In this case, the hidden neurode will not fire if only one input neurode fires. This will cause the output neurode to fire, since +1 is greater than the 0.5 threshold. But if both input neurodes fire, the result is a total input of 1+1-2=0 to the output neurode. Since 0 is less than the 0.5 threshold of the output neurode, the output neurode will not fire.
The solution shown is not the only possible solution to the XOR problem in a simple neurode network. There are, in fact, infinitely many possible solutions. Two more example solutions are shown. Negative connection weights represent inhibitory rather than excitatory weights (synapses). Note that threshold values can also be less than zero.
[NETWORK WITH HIDDEN NEURODE][NETWORK WITH HIDDEN NEURODE]
In these examples the relationships between the thresholds, weights, inputs and outputs can be analyzed in detail. But in neural networks (both computer and biological) with large numbers of inputs, outputs and hidden neurodes (neurons), the task of determining weights and threshold values required to achieve desired outputs from given inputs becomes practically impossible. Computer models therefore attempt to train networks to adjust their weights to give desired outputs from given inputs. If biological memory and learning are the result of synapse strengths -- and modifications of synapse strengths -- then the computer models can be very instructive.
Computer neural network models are described in terms of their architecture (patterns of connection) and in terms of the way they are trained (rules for modifying weights). I will therefore classify my descriptions into four categories: (1) Perceptrons & Backpropagation, (2) Competitive Learning, (3) Attractor Networks and (4) Other Neural Network Models. 

III. PERCEPTRONS & BACKPROPAGATION

[PERCEPTRON NETWORK]The architecture of a Perceptron consists of a single input layer of many neurodes, and a single output layer of many neurodes. The simple "networks" illustrated at the beginning, to produce logical "AND" and "OR" operations have a Perceptron architecture. But to be called a Perceptron, the network must also implement the Perceptron learning rule for weight adjustment. This learning rule compares the actual network output to the desired network output to determine the new weights. For example, if the network illustrated gives a "0 1 0" output when "0 1 1" is the desired output for some input, all of the weights leading to the third neurode would be adjusted by some factor.
The Adaline is a modification of the Perceptron, which substitutes bipolar (-1/+1) for binary (0/1) inputs, and adds "bias". But the most important modification is the use of adelta learning rule. As with the Perceptron, the delta rule compares desired output to actual output to compute weight adjustment. But the delta rule squares the errors and averages them to avoid negative errors cancelling-out positive ones. Adalines have been used to eliminate echoes in phone lines for nearly 30 years.
[BACKPROPAGATION NETWORK]
Neural network research went through many years of stagnation after Marvin Minsky and his colleague showed that Perceptrons could not solve problems such as the EXCLUSIVE-OR problem. Several modifications of the Perceptron model, however, produced theBackpropagation model -- a model which can solve XOR and many more difficult problems. Backpropagation has proven to be so powerful that it currently accounts for 80% of all neural network applications. In Backprop, a third neurode layer is added (the hidden layer) and the discrete thresholding function is replaced with a continuous (sigmoid) one. But the most important modification for Backprop is the generalized delta rule, which allows for adjustment of weights leading to the hidden layer neurodes in addition to the usual adjustments to the weights leading to the output layer neurodes. Using the generalize delta rule to adjust the weights leading to the hidden units is backpropagating the error-adjustment. 

IV. COMPETITIVE LEARNING

The prototypic competitive learning ("self-organizing") model is the Kohonen network (named after the Finnish researcher who pioneered the research). A Kohonen network is a two-layered network, much like the Perceptron. But the output layer for a two-neurode input layer can be represented as a two-dimensional grid, also known as the "competitive layer". The input values are continuous, typically normalized to any value between -1 and +1. Training of the Kohonen network does not involve comparing the actual output with a desired output. Instead, the input vector is compared with the weight vectors leading to the competitive layer. The neurode with a weight vector most closely matching the input vector is called the winning neurode.
[KOHONEN NETWORK]For example, if the input vector is (0.35, 0.8), the winning neurode might have weight vector (0.4, 0.78). The learning rule would adjust the weight vector to make it even closer to the input vector. Only the winning neurode produces output, and only the winning neurode gets its weights adjusted. In more sophisticated models, only the weights of the winning neurode and its immediate neighbors are updated.
After training, a limited number of input vectors will map to activation of distinct output neurodes. Because the weights are modified in response to the inputs, rather than in response to desired outputs, competitive learning is called unsupervised learning, to distinguish it from the supervised learning of Perceptrons, Adalines and Backpropagation. In supervised learning, comparison is made between actual outputs and desired outputs supplied by an external supervisor. There is no external supervisor in competitive learning. 

V. ATTRACTOR NETWORKS

The most notable attractor networks are the Hopfield Network, the Boltzman Machine and the Bidirectional Associative Memory (BAM). The Hopfield Network can be represented in a number of ways, all of which are somewhat equivalent.
[HOPFIELD NETWORK]
[CROSSBAR NETWORK]
The diagram on the left indicates that every neurode has a connection with every other neurode in two directions, but it omits the detail that each neurode is also an input neurode and an output neurode, as is shown in the middle diagram. The diagram on the right is called a Crossbar Network representation of a Hopfield Network, and it is a convenient tool when analyzing connection weights as a matrix of numbers.
The Hopfield Network is presented with an input vector, and the input vector remains active as the neurodes update their weights one-by-one in sequence (usually more than once for each neurode) until the output is constant. Weights are updated on the basis of the difference between input and output for each individual neurode. This process of arriving at the output is called relaxation or annealing, and can be expressed as an energy equation -- which is exactly what was done by physicist John Hopfield who conceived of this network.
[BALL ROLLING ON HILLS]The lower energy states are the "attractors" of the network. The settling of the network into its lowest energy state can be compared to a ball rolling to the bottom of a hill. If the hill has a hump, however, the ball may not fall to its lowest energy state, but be caught in a local minimum. The Boltzman Machine is a modified Hopfield Network that adds a "Boltzman temperature term" ("noise") to jostle the ball out of the local minimum.
The Hopfield Network is an associative memory because it can "recognize" patterns. For example, a fully trained network might give the three outputs (1,1,1,-1,-1,-1), (1,1,-1,-1,1,1) or (-1,1,-1,1,-1,1). If given the input (1,1,1,1,-1,-1) it would most likely give as output (1,1,1,-1,-1,-1) -- the first output -- since that is the pattern closest to the one that the network recognizes. In practice, to avoid errors, a Hopfield Network should not be expected to recognize a number of patterns that is more than 15% of the number of neurodes. That is, a 100 neurode network should not be expected to recognize more than 15 patterns.
Bidirectional Associative Memories consist of two layers of neurodes, each fully connected. For an autoassociative memory, the two layers will have the same number of neurodes and will output patterns similar to the input. For a heteroassociative memory, the two layers can have a different number of neurodes, as would be the case in mapping between ASCII codes and alphabetic letters.

VI. OTHER NEURAL NETWORK MODELS

Counterpropagation Networks are three-layered networks in which the hidden layer is a Kohonen layer. This model eliminates the need for backpropagation, thereby reducing training time, but performance is worse than with backpropagation.
Recurrent networks take some of the outputs and feed them back to the inputs or to hidden layer neurodes. (Hopfield Networks are totally recurrent.) Adaptive Resonance Theory (ART) networks attempt to simulate biological reality by the use of time-varying inputs rather than simultaneous inputs. Weights may be allowed to decay with time when they are not being continuously updated.
There are other models, but the ones already mentioned are the most prominent in the current field of neural network application and research.

VII. PSYCHOLOGICAL CONSIDERATIONS

The learning and memory properties of neural networks resemble the properties of human learning and memory. Associative memory is so-called content-addressable memory. For example, to remember the bird that reputedly puts its head in the sand, the description may be adequate to retrieve the name "ostrich" and a visual image of the bird -- comparable to the associative memory retrieval of the Hopfield Network.
[DEPICTION OF WORD MAKE OBSCURED]Similarly, associative memory can allow one to decipher the word "MAKE" when some of the letters are partly obscured.
Neural networks also have a capacity to generalize from particulars. They can recognize handwritten letters, despite a wide variability in form that is anathema to algorithm-bound von Neuman computers. And neural networks learn by being presented with examples, rather than by being given algorithms. Implicitly, neural networks create their own algorithms.

VIII. BIOLOGICAL CONSIDERATIONS

Neurophysiologists spent many years searching for the "engram", ie, the precise location in the brain for specific memories. The engram proved to be elusive. The idea that memories are stored in a distributed fashion -- as synaptic strengths (weights) in a neural network -- now seems very compelling. Neural networks embody the integration of "software" and "hardware". Biological and artificial neural networks demonstrate the property of "graceful degradation", ie, destruction of individual neurons or of small groups of neurons reduces performance, but does not have the devastating effect that destroying the contents of a computer memory location would have.
This is not to say that localization does not exist in the brain. Neurons in the superior temporal sulcus of the cerebral cortex, for example, respond selectively to faces. But there is no "grandmother cell", ie, no cell that responds specifically to the face of someone's grandmother. Instead, each neuron has a different response pattern to a set of faces. Ensembles of neurons encode the response to identify a particular face. And an overlapping ensemble may identify another face.
A very real difficulty of correlating artificial neural networks with biological ones lies in the way weights are modified in the former and synaptic strengths are modified in the latter. Weights are altered mathematically in a computer network, based on differences in values. Synaptic strengths, on the other hand, are modified in response to synaptic activity. The backpropagation model, in particular, is held to be biologically unrealistic insofar as it would require a supervisor and a violation of the unidirectional flow of information seen in axons. Some researchers have postulated parallel, backward-directed axons to return error information, but the modification of synaptic strength by these axons is still very hypothetical.
Many researchers feel that competitive (unsupervised) learning is a more persuasive model for brain neural networks than any supervised learning model. The kind of learning that occurs in the visual cortex of the eye shortly after birth seems to correlate very well with the pattern discrimination that emerges from Kohonen Networks. Nonetheless, the mechanisms of synaptic strength modification remains a sticking point.
The CA3 region of the hippocampus receives inputs from diverse regions of the association cortex via the entorhinal cortex and the dentate gyrus. Of the roughly 16,000 synapses seen on a typical CA3 neuron, approximately 12,000 of those synapses will be inputs from other CA3 neurons. This suggests that the CA3 cells are a recurrent collateral system -- specifically an autoassociation matrix (Hopfield Network). It has been hypothesized that the CA3 neurons autoassociate the impressions from diverse sensations of an event into a single episodic memory. Ensembles of CA3 neurons associated with the episode would further congeal the memory by provoking competitive learning in the CA1 neurons, with the winning CA1 neurons returning impressions of the crystallized episode for storage in the cerebral cortex. Although circumstantial evidence lends support to this theory, it is still very much a theory.

IX. IMPLICATIONS FOR CRYONICS

The idea that memory & identity are distributed & redundantly stored, rather than localized & unique has positive implications for cryonics. It implies that precise reconstruction of the 100 trillion synapses of the brain need not be necessary to restore memory & identity.
Neural networks are "black boxes" of memory. By this I mean that a researcher may know the precise values of inputs, the precise values of outputs and the precise values of the connections weights without understanding the relationships -- because such understanding is awesomely difficult with complex networks. Researchers do not program neural networks by assigning weights -- they train the networks to give desired output for given input, and then (perhaps) record the weights. The implication of this approach is that near-term reconstruction of the human mind may take place by deducing and reconstructing synaptic strengths, without any understanding of the direct relationship between those weights and specific memories. For persons concerned about their "mental privacy", this might be reassuring, but for persons hoping for a reconstruction of the brain based on written memoirs, it is not reassuring. On the other hand, far-future reconstructions may be possible by assigning synaptic strengths based on written memoirs. In that case, complete destruction of the original synapses may prove not to be an ultimate disaster.

0 comments: