I. INTRODUCTORY REMARKS
This essay was originally part of the series "The Anatomical Basis of Mind", available on this website as The Anatomical Basis of Mind (Neurophysiology & Consciousness). The purpose of that series was to learn & explain what is known about how neurophysiology results in the phenomena of mind&self. Further, to understand which brain/neuronal structures must be preserved to preserve mind&self. And still further, to be able to use that knowledge to suggest & evaluate cryonics and other preservation methods.
This installment addresses the subject of computer-models of neural networks and the relevance of those models to the functioning brain. The computer field of Artificial Intelligence is a vast bottomless pit which would lead this series too far from biological reality -- and too far into speculation -- to be included. Neural network theory will be the singular exception because the model is so persuasive and so important that it cannot be ignored.
Neurobiology provides a great deal of information about the physiology of individual neurons as well as about the function of nuclei and other gross neuroanatomical structures. But understanding the behavior of networks of neurons is exceedingly challenging for neurophysiology, given current methods. Nonetheless, network behavior is important, especially in light of evidence for so-called "emergent properties", ie, properties of networks that are not obvious from an understanding of neuron physiology. Although neural networks as they are implemented on computers were inspired by the function of biological neurons, many of the designs have become far removed from biological reality. Moreover, many of the designers have lost all interest in simulating neurophysiology -- they are more interested in using their new tools to solve problems. The theory of computation of artificial neural networks can be highly mathematical, with some networks existing entirely as mathematical models. My exposition will attempt to minimize mathematics, using only verbal descriptions and some simple arithmetic.
II. MODEL NEURONS: NEURODES
The building-block of computer-model neural networks is a processing unit called a neurode, which captures many essential features of biological neurons.Now consider a more complex network, one designed to do the logical operation "EXCLUSIVE-OR" (XOR). The threshold values are shown inside the neurode circles and the weights are shown alongside the connections. Note the addition of a neurode (the hidden neurode) between the input and output neurodes.
The solution shown is not the only possible solution to the XOR problem in a simple neurode network. There are, in fact, infinitely many possible solutions. Two more example solutions are shown. Negative connection weights represent inhibitory rather than excitatory weights (synapses). Note that threshold values can also be less than zero.
In these examples the relationships between the thresholds, weights, inputs and outputs can be analyzed in detail. But in neural networks (both computer and biological) with large numbers of inputs, outputs and hidden neurodes (neurons), the task of determining weights and threshold values required to achieve desired outputs from given inputs becomes practically impossible. Computer models therefore attempt to train networks to adjust their weights to give desired outputs from given inputs. If biological memory and learning are the result of synapse strengths -- and modifications of synapse strengths -- then the computer models can be very instructive.
Computer neural network models are described in terms of their architecture (patterns of connection) and in terms of the way they are trained (rules for modifying weights). I will therefore classify my descriptions into four categories: (1) Perceptrons & Backpropagation, (2) Competitive Learning, (3) Attractor Networks and (4) Other Neural Network Models.
III. PERCEPTRONS & BACKPROPAGATION
The Adaline is a modification of the Perceptron, which substitutes bipolar (-1/+1) for binary (0/1) inputs, and adds "bias". But the most important modification is the use of adelta learning rule. As with the Perceptron, the delta rule compares desired output to actual output to compute weight adjustment. But the delta rule squares the errors and averages them to avoid negative errors cancelling-out positive ones. Adalines have been used to eliminate echoes in phone lines for nearly 30 years.
Neural network research went through many years of stagnation after Marvin Minsky and his colleague showed that Perceptrons could not solve problems such as the EXCLUSIVE-OR problem. Several modifications of the Perceptron model, however, produced theBackpropagation model -- a model which can solve XOR and many more difficult problems. Backpropagation has proven to be so powerful that it currently accounts for 80% of all neural network applications. In Backprop, a third neurode layer is added (the hidden layer) and the discrete thresholding function is replaced with a continuous (sigmoid) one. But the most important modification for Backprop is the generalized delta rule, which allows for adjustment of weights leading to the hidden layer neurodes in addition to the usual adjustments to the weights leading to the output layer neurodes. Using the generalize delta rule to adjust the weights leading to the hidden units is backpropagating the error-adjustment.
IV. COMPETITIVE LEARNING
The prototypic competitive learning ("self-organizing") model is the Kohonen network (named after the Finnish researcher who pioneered the research). A Kohonen network is a two-layered network, much like the Perceptron. But the output layer for a two-neurode input layer can be represented as a two-dimensional grid, also known as the "competitive layer". The input values are continuous, typically normalized to any value between -1 and +1. Training of the Kohonen network does not involve comparing the actual output with a desired output. Instead, the input vector is compared with the weight vectors leading to the competitive layer. The neurode with a weight vector most closely matching the input vector is called the winning neurode.After training, a limited number of input vectors will map to activation of distinct output neurodes. Because the weights are modified in response to the inputs, rather than in response to desired outputs, competitive learning is called unsupervised learning, to distinguish it from the supervised learning of Perceptrons, Adalines and Backpropagation. In supervised learning, comparison is made between actual outputs and desired outputs supplied by an external supervisor. There is no external supervisor in competitive learning.
V. ATTRACTOR NETWORKS
The most notable attractor networks are the Hopfield Network, the Boltzman Machine and the Bidirectional Associative Memory (BAM). The Hopfield Network can be represented in a number of ways, all of which are somewhat equivalent.The diagram on the left indicates that every neurode has a connection with every other neurode in two directions, but it omits the detail that each neurode is also an input neurode and an output neurode, as is shown in the middle diagram. The diagram on the right is called a Crossbar Network representation of a Hopfield Network, and it is a convenient tool when analyzing connection weights as a matrix of numbers.
The Hopfield Network is presented with an input vector, and the input vector remains active as the neurodes update their weights one-by-one in sequence (usually more than once for each neurode) until the output is constant. Weights are updated on the basis of the difference between input and output for each individual neurode. This process of arriving at the output is called relaxation or annealing, and can be expressed as an energy equation -- which is exactly what was done by physicist John Hopfield who conceived of this network.
The Hopfield Network is an associative memory because it can "recognize" patterns. For example, a fully trained network might give the three outputs (1,1,1,-1,-1,-1), (1,1,-1,-1,1,1) or (-1,1,-1,1,-1,1). If given the input (1,1,1,1,-1,-1) it would most likely give as output (1,1,1,-1,-1,-1) -- the first output -- since that is the pattern closest to the one that the network recognizes. In practice, to avoid errors, a Hopfield Network should not be expected to recognize a number of patterns that is more than 15% of the number of neurodes. That is, a 100 neurode network should not be expected to recognize more than 15 patterns.
Bidirectional Associative Memories consist of two layers of neurodes, each fully connected. For an autoassociative memory, the two layers will have the same number of neurodes and will output patterns similar to the input. For a heteroassociative memory, the two layers can have a different number of neurodes, as would be the case in mapping between ASCII codes and alphabetic letters.
VI. OTHER NEURAL NETWORK MODELS
Counterpropagation Networks are three-layered networks in which the hidden layer is a Kohonen layer. This model eliminates the need for backpropagation, thereby reducing training time, but performance is worse than with backpropagation.Recurrent networks take some of the outputs and feed them back to the inputs or to hidden layer neurodes. (Hopfield Networks are totally recurrent.) Adaptive Resonance Theory (ART) networks attempt to simulate biological reality by the use of time-varying inputs rather than simultaneous inputs. Weights may be allowed to decay with time when they are not being continuously updated.
There are other models, but the ones already mentioned are the most prominent in the current field of neural network application and research.
VII. PSYCHOLOGICAL CONSIDERATIONS
The learning and memory properties of neural networks resemble the properties of human learning and memory. Associative memory is so-called content-addressable memory. For example, to remember the bird that reputedly puts its head in the sand, the description may be adequate to retrieve the name "ostrich" and a visual image of the bird -- comparable to the associative memory retrieval of the Hopfield Network.Neural networks also have a capacity to generalize from particulars. They can recognize handwritten letters, despite a wide variability in form that is anathema to algorithm-bound von Neuman computers. And neural networks learn by being presented with examples, rather than by being given algorithms. Implicitly, neural networks create their own algorithms.
VIII. BIOLOGICAL CONSIDERATIONS
Neurophysiologists spent many years searching for the "engram", ie, the precise location in the brain for specific memories. The engram proved to be elusive. The idea that memories are stored in a distributed fashion -- as synaptic strengths (weights) in a neural network -- now seems very compelling. Neural networks embody the integration of "software" and "hardware". Biological and artificial neural networks demonstrate the property of "graceful degradation", ie, destruction of individual neurons or of small groups of neurons reduces performance, but does not have the devastating effect that destroying the contents of a computer memory location would have.This is not to say that localization does not exist in the brain. Neurons in the superior temporal sulcus of the cerebral cortex, for example, respond selectively to faces. But there is no "grandmother cell", ie, no cell that responds specifically to the face of someone's grandmother. Instead, each neuron has a different response pattern to a set of faces. Ensembles of neurons encode the response to identify a particular face. And an overlapping ensemble may identify another face.
A very real difficulty of correlating artificial neural networks with biological ones lies in the way weights are modified in the former and synaptic strengths are modified in the latter. Weights are altered mathematically in a computer network, based on differences in values. Synaptic strengths, on the other hand, are modified in response to synaptic activity. The backpropagation model, in particular, is held to be biologically unrealistic insofar as it would require a supervisor and a violation of the unidirectional flow of information seen in axons. Some researchers have postulated parallel, backward-directed axons to return error information, but the modification of synaptic strength by these axons is still very hypothetical.
Many researchers feel that competitive (unsupervised) learning is a more persuasive model for brain neural networks than any supervised learning model. The kind of learning that occurs in the visual cortex of the eye shortly after birth seems to correlate very well with the pattern discrimination that emerges from Kohonen Networks. Nonetheless, the mechanisms of synaptic strength modification remains a sticking point.
The CA3 region of the hippocampus receives inputs from diverse regions of the association cortex via the entorhinal cortex and the dentate gyrus. Of the roughly 16,000 synapses seen on a typical CA3 neuron, approximately 12,000 of those synapses will be inputs from other CA3 neurons. This suggests that the CA3 cells are a recurrent collateral system -- specifically an autoassociation matrix (Hopfield Network). It has been hypothesized that the CA3 neurons autoassociate the impressions from diverse sensations of an event into a single episodic memory. Ensembles of CA3 neurons associated with the episode would further congeal the memory by provoking competitive learning in the CA1 neurons, with the winning CA1 neurons returning impressions of the crystallized episode for storage in the cerebral cortex. Although circumstantial evidence lends support to this theory, it is still very much a theory.
IX. IMPLICATIONS FOR CRYONICS
The idea that memory & identity are distributed & redundantly stored, rather than localized & unique has positive implications for cryonics. It implies that precise reconstruction of the 100 trillion synapses of the brain need not be necessary to restore memory & identity.
Neural networks are "black boxes" of memory. By this I mean that a researcher may know the precise values of inputs, the precise values of outputs and the precise values of the connections weights without understanding the relationships -- because such understanding is awesomely difficult with complex networks. Researchers do not program neural networks by assigning weights -- they train the networks to give desired output for given input, and then (perhaps) record the weights. The implication of this approach is that near-term reconstruction of the human mind may take place by deducing and reconstructing synaptic strengths, without any understanding of the direct relationship between those weights and specific memories. For persons concerned about their "mental privacy", this might be reassuring, but for persons hoping for a reconstruction of the brain based on written memoirs, it is not reassuring. On the other hand, far-future reconstructions may be possible by assigning synaptic strengths based on written memoirs. In that case, complete destruction of the original synapses may prove not to be an ultimate disaster.
0 comments:
Post a Comment