[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

ALTERNATIVE NEURAL ARCHITECTURES
by Will Dwinnell, contributing editor, The Gordian Institute


INTRODUCTION

Of the "new" modeling technologies in use today, neural networks are probably the most popular. They are probably also the most diverse. Literally dozens of different neural architectures have been devised by enterprising researchers. [3] Taking into account variations on these architectures and the multiplicity of training algorithms, this number easily climbs into the hundreds. Among this expanse of architectures are many different behavioral properties. Some neural networks are quick to train but slow to recall. Others are quick to recall but slow to train. Still others display different characteristics in between.

Given all of this, it should be obvious that blanket statements regarding neural networks are problematic at best. Most such comments are directed at multi-layer perceptrons (MLP, also commonly referred to as feedforward neural networks, FFNN), the most commonly implemented artificial neural network. One typical assertion is that "neural networks are slow". More precisely, MLP networks tend to be slow to train. Importantly, though, note that MLP networks are relatively quick to recall. Just as importantly, other sorts of neural architectures (such as Generalized Regression Neural Networks, GRNN) provide very fast training

There is no reason to limit one's work to MLPs, despite their popularity. Understanding the available alternatives is important. Neural networks, taken as a broad class of modeling methods, provide a rich set of design trade-offs. Fast training, fast recall, multiple outputs, low storage requirements and easy update are qualities that neural networks can provide. (For a more comprehensive list of important modeling algorithm characteristics, see "Modeling Methodology X: Algorithm Selection", by Dwinnell.) [1] No single modeling algorithm provides all of these qualities and neural networks are no exception. Real-world applications require selecting a subset of desirable qualities and making trade-offs. As a class of algorithms, neural networks are quite competitive in providing different mixtures of such qualities. Here, attention will be restricted to two particular neural architectures: CMAC and SDM. A complete explanation of both algorithms requires too much space, so a brief description of each will be given followed by analysis.

CMAC

CMAC (the abbreviation has multiple published meanings) was invented by James Albus as a model of brain function. [2] It is popular in some robotics circles and several interesting variants have appeared. Simplistically, CMAC's operation can be broken down into a series of steps:

  1. BIN NUMERIC INPUTS: CMAC accepts numeric inputs and converts them to vectors of 0s and 1s (like [0 0 1 0 0]), with each position in the vector representing a single bin. If an input variable ranges from 0 to 100 and the vector is constructed with 10 bins, each element of the vector represents one tenth of the total range. The first element of the vector represents 0 to 10, the second element represents 10 to 20 and so on. For example, if the input value for this variable is 13 (which is between 10 and 20), the second element of the vector would be active (1) and all the rest would be inactive (0): [0 1 0 0 0 0 0 0 0 0].
  2. GENERALIZE THE VECTORS: The bin vectors which represent the inputs are generalized, meaning that some number of elements (bins) adjacent to active elements are set to 1. This has the affect of "spreading" the input activation in the vector. In our example using 13 as an input above, the vector might be generalized to [1 1 1 0 0 0 0 0 0 0]. Taken together, these vectors define a multidimensional table. For any given example, some small region of the table will be active, while the rest will be inactive.
  3. HASH THE VECTORS INTO A HASH TABLE (optional): This step is only necessary if the required storage for the resulting table becomes too large. Hashing can often reduce the size of the CMAC memory without significantly degrading model accuracy. The end result is either a single-dimensional array of 0s and 1s (with hashing) or table of 0s and 1s (without hashing).
  4. BUILD A SIMPLE LINEAR MODEL OFF OF THE HASH TABLE: This can be a simple linear regression or a single linear neuron trained in the usual fashion, either of which are very fast to build. The literature records that gradient descent methods can train the CMAC to convergence in as few as 4 passes, with an practical upper limit being around 10 passes (the author's experience confirms this).
  5. RECALL: Recall involves running input variables through the same process and feeding the active cells in the hash table to the linear model.

CMAC is very fast to train and recall. The linear model at the end is the only part of the system which requires training. CMAC's nonlinearity comes from the way it breaks up the space of possible examples into little cells. This is also a weakness: inputs are binned so that similar examples will sometimes produce identical outputs. The number of these cells also grows exponentially with the number of inputs and geometrically with the resolution of the bins, which for many problems can mean large storage requirements. Hashing can often reduce this as a concern. CMAC trains so fast that it can be used for some real-time applications, depending on the time constraint. CMACs can easily be updated, unlike MLPs. CMACs can even be configured to forget: older information will be over-written by new information as new examples become available. CMAC resembles local models like k-Nearest Neighbors or GRNN, but its hashing function improves its efficiency.

SDM

Many modeling algorithms suffer from limited scalability. With small data sets (examples numbering in the thousands), they perform satisfactorily, but with larger data sets (hundreds of thousands of examples or more), their performance deteriorates so much as to make them unusable.

Developed by Pentii Kanerva while in the United States, SDM (Sparse Distributed Memory) provides the ability to handle very large data sets. The basic SDM accepts only binary values as inputs and produces binary outputs. Typically, numeric values are represented using binned values (as with the CMAC). SDM neurons contain binary address vectors (which are matched to inputs) and have integer output counters. Multiple outputs are possible by using multiple output counters, but the description here will be confined to a single-output system. SDM's operation can be summarized as follows:

  1. GENERATE A (LARGE) SET OF NEURONS WITH RANDOM ADDRESS VECTORS: The neurons start with a random address vector and an output counter of 0.
  2. TRAIN THE NETWORK BY STORING DATA IN THE MEMORY: With the SDM, new examples are written to memory by cycling through all neurons. Any neuron whose address vector is sufficiently similar to the example input is updated.

Neurons are updated by incrementing its output counter if the example output is 1 or decrementing if the example output is 0.

3. RECALL: Input variables are run through the same process to produce an address vector. All neurons are checked for proximity to this address vector and their output counters are combined to produce the SDM output.

4. SDM is binary by nature and thus maps very naturally to conventional binary computer hardware. Most other neural architectures (including MLP), in contrast, employ numeric values. Such numbers are most often represented as floating-point numbers, which require considerable compute power. A related neural architecture which also maps well to binary hardware is the RAM neuron (and the associated RAM neuron discriminator and multi-discriminator).

The most obvious disadvantage of the SDM is the need to cycle through all the neurons during training or recall. Although the SDM scales up very well to large numbers of neurons, the need for compute cycles scales directly with the number of neurons. The literature records plans for SDMs with as many as a million neurons. This is remarkable on two counts:

  1. that it is at all feasible (the SDM learning mechanism can be implemented at this scale, whereas it would be extremely difficult to implement a backpropagation-trained million-neuron MLP, for example)
  2. it will still require accessing all 1 million neurons once for any single training pass or recall.

In terms of advantages, though, the SDM presents several intriguing behavioral qualities (which to this author's knowledge have not been explored nor even recorded in the literature). First, training of neurons in SDMs is independent: the learning which takes place in any neuron does not affect learning in any other neuron. Importantly, if an SDM with 5000 neurons was constructed and performance was deemed unsatisfactory, 5000 more neurons could be added to the SDM and trained without having to update (or disturb) the original 5000 neurons. Notice, too, the prospect of constructing the SDM in parallel: 5000 neurons might be trained on one computer and another 5000 neurons trained on another computer. The resulting total SDM would be composed by simply appending one set of neurons to the other. Independence of neurons from one another also allows a software implementation to deal with sections of the SDM incrementally, training only as many neurons as will fit in RAM at one time.

Independence of neuron activity also has a significant implication for recall. SDMs may be partially fired to increase speed. Instead of cycling through the entire set of neurons, some fraction of them may be used instead, at the (possible) expense of model accuracy. This is not possible at all with MLPs. Notice, too, that the SDM may begin recall and stopped at any point, allowing for hard real-time cut-off of its activity. Such a process would allow the SDM to provide the best possible answer in whatever time was available, even without knowing the time frame at the beginning of recall. SDM recall, like SDM training, can be parallelized. Different portions of the SDM may be located on different computers and their partial outputs may be combined simply and quickly.

With SDM, learned examples are also independent of one another. If an SDM is trained on 5000 examples and 5000 new examples arrive later, the SDM may be updated with them without concern for the initial 5000 examples. In contrast, an MLP would likely need to be retrained from scratch.

CONCLUSIONS

Non-mainstream neural architectures provide advantages and disadvantages. Their basic advantage is that they offer distinct design trade-offs from those available in more popular neural architectures. Their most fundamental disadvantage is that, being less popular, they are less supported to a lesser degree in readily available commercial software. To the best of this author's knowledge, neither of these architectures is supported in commercial software. This necessitates some level of effort on the modeler's part in the form of writing code, but one redeeming quality of some alternative architectures is that they are conceptually simple and thus easy to code.

REFERENCES
[1] Criteria For Modeling Algorithm Selection: "Modeling Methodology 3:

Algorithm Selection" by W. Dwinnell, published in "PC AI", Mar./Apr., 1998

[2] CMAC: "Brain, Behavior, and Robotics" by J. Albus published by Byte

Books

[3] "Advanced Methods in Neural Computing" by Wasserman, published by VNR,

ISBN 0-442-00451-3

---

For more information, see http://www.gordianknot.com


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]