ALTERNATIVE NEURAL ARCHITECTURES
by Will Dwinnell, contributing editor, The Gordian Institute
INTRODUCTION
Of the "new" modeling technologies in use today, neural networks are probably the most popular. They are probably also the most diverse. Literally dozens of different neural architectures have been devised by enterprising researchers. [3] Taking into account variations on these architectures and the multiplicity of training algorithms, this number easily climbs into the hundreds. Among this expanse of architectures are many different behavioral properties. Some neural networks are quick to train but slow to recall. Others are quick to recall but slow to train. Still others display different characteristics in between.
Given all of this, it should be obvious that blanket statements regarding neural networks are problematic at best. Most such comments are directed at multi-layer perceptrons (MLP, also commonly referred to as feedforward neural networks, FFNN), the most commonly implemented artificial neural network. One typical assertion is that "neural networks are slow". More precisely, MLP networks tend to be slow to train. Importantly, though, note that MLP networks are relatively quick to recall. Just as importantly, other sorts of neural architectures (such as Generalized Regression Neural Networks, GRNN) provide very fast training
There is no reason to limit one's work to MLPs, despite their popularity. Understanding the available alternatives is important. Neural networks, taken as a broad class of modeling methods, provide a rich set of design trade-offs. Fast training, fast recall, multiple outputs, low storage requirements and easy update are qualities that neural networks can provide. (For a more comprehensive list of important modeling algorithm characteristics, see "Modeling Methodology X: Algorithm Selection", by Dwinnell.) [1] No single modeling algorithm provides all of these qualities and neural networks are no exception. Real-world applications require selecting a subset of desirable qualities and making trade-offs. As a class of algorithms, neural networks are quite competitive in providing different mixtures of such qualities. Here, attention will be restricted to two particular neural architectures: CMAC and SDM. A complete explanation of both algorithms requires too much space, so a brief description of each will be given followed by analysis.
CMAC
CMAC (the abbreviation has multiple published meanings) was invented by James Albus as a model of brain function. [2] It is popular in some robotics circles and several interesting variants have appeared. Simplistically, CMAC's operation can be broken down into a series of steps:
CMAC is very fast to train and recall. The linear model at the end is the only part of the system which requires training. CMAC's nonlinearity comes from the way it breaks up the space of possible examples into little cells. This is also a weakness: inputs are binned so that similar examples will sometimes produce identical outputs. The number of these cells also grows exponentially with the number of inputs and geometrically with the resolution of the bins, which for many problems can mean large storage requirements. Hashing can often reduce this as a concern. CMAC trains so fast that it can be used for some real-time applications, depending on the time constraint. CMACs can easily be updated, unlike MLPs. CMACs can even be configured to forget: older information will be over-written by new information as new examples become available. CMAC resembles local models like k-Nearest Neighbors or GRNN, but its hashing function improves its efficiency.
SDM
Many modeling algorithms suffer from limited scalability. With small data sets (examples numbering in the thousands), they perform satisfactorily, but with larger data sets (hundreds of thousands of examples or more), their performance deteriorates so much as to make them unusable.
Developed by Pentii Kanerva while in the United States, SDM (Sparse Distributed Memory) provides the ability to handle very large data sets. The basic SDM accepts only binary values as inputs and produces binary outputs. Typically, numeric values are represented using binned values (as with the CMAC). SDM neurons contain binary address vectors (which are matched to inputs) and have integer output counters. Multiple outputs are possible by using multiple output counters, but the description here will be confined to a single-output system. SDM's operation can be summarized as follows:
Neurons are updated by incrementing its output counter if the example output is 1 or decrementing if the example output is 0.
3. RECALL: Input variables are run through the same process to produce an address vector. All neurons are checked for proximity to this address vector and their output counters are combined to produce the SDM output.
4. SDM is binary by nature and thus maps very naturally to conventional binary computer hardware. Most other neural architectures (including MLP), in contrast, employ numeric values. Such numbers are most often represented as floating-point numbers, which require considerable compute power. A related neural architecture which also maps well to binary hardware is the RAM neuron (and the associated RAM neuron discriminator and multi-discriminator).
The most obvious disadvantage of the SDM is the need to cycle through all the neurons during training or recall. Although the SDM scales up very well to large numbers of neurons, the need for compute cycles scales directly with the number of neurons. The literature records plans for SDMs with as many as a million neurons. This is remarkable on two counts:
In terms of advantages, though, the SDM presents several intriguing behavioral qualities (which to this author's knowledge have not been explored nor even recorded in the literature). First, training of neurons in SDMs is independent: the learning which takes place in any neuron does not affect learning in any other neuron. Importantly, if an SDM with 5000 neurons was constructed and performance was deemed unsatisfactory, 5000 more neurons could be added to the SDM and trained without having to update (or disturb) the original 5000 neurons. Notice, too, the prospect of constructing the SDM in parallel: 5000 neurons might be trained on one computer and another 5000 neurons trained on another computer. The resulting total SDM would be composed by simply appending one set of neurons to the other. Independence of neurons from one another also allows a software implementation to deal with sections of the SDM incrementally, training only as many neurons as will fit in RAM at one time.
Independence of neuron activity also has a significant implication for recall. SDMs may be partially fired to increase speed. Instead of cycling through the entire set of neurons, some fraction of them may be used instead, at the (possible) expense of model accuracy. This is not possible at all with MLPs. Notice, too, that the SDM may begin recall and stopped at any point, allowing for hard real-time cut-off of its activity. Such a process would allow the SDM to provide the best possible answer in whatever time was available, even without knowing the time frame at the beginning of recall. SDM recall, like SDM training, can be parallelized. Different portions of the SDM may be located on different computers and their partial outputs may be combined simply and quickly.
With SDM, learned examples are also independent of one another. If an SDM is trained on 5000 examples and 5000 new examples arrive later, the SDM may be updated with them without concern for the initial 5000 examples. In contrast, an MLP would likely need to be retrained from scratch.
CONCLUSIONS
Non-mainstream neural architectures provide advantages and disadvantages. Their basic advantage is that they offer distinct design trade-offs from those available in more popular neural architectures. Their most fundamental disadvantage is that, being less popular, they are less supported to a lesser degree in readily available commercial software. To the best of this author's knowledge, neither of these architectures is supported in commercial software. This necessitates some level of effort on the modeler's part in the form of writing code, but one redeeming quality of some alternative architectures is that they are conceptually simple and thus easy to code.
REFERENCES
[1] Criteria For Modeling Algorithm Selection: "Modeling Methodology 3:
Algorithm Selection" by W. Dwinnell, published in "PC AI", Mar./Apr., 1998
[2] CMAC: "Brain, Behavior, and Robotics" by J. Albus published by Byte
Books
[3] "Advanced Methods in Neural Computing" by Wasserman, published by VNR,
ISBN 0-442-00451-3
---
For more information, see http://www.gordianknot.com