About MDD - Subscription Info
May 2002
Vol. 5, No. 5, pp 57–61.
sites and software

Neural networks to the rescue

These parallel-processing algorithms help chemists understand the data.

Computers have tremendous usefulness in science and everyday life, with their unique ability to do exactly what you program them to do. However, scientific research often forges into the unknown, an unattainable realm for conventional computer programs. Various methods have been devised to allow computers to handle diversity. One of these methods takes advantage of structures called neural networks (NNs), which mirror the activity of the human brain.

NNs are used primarily to help analyze data, and they can reveal useful patterns in large,complex data sets and extrapolate trends. If a mathematician had sufficiently large amounts of data and lots of time, he or she might be able to characterize the data by using a set of parallel processors to run complex statistical analysis routines. Some statisticians and mathematicians might consider this activity to be an interesting research problem in itself. Chemists, however, already have their interesting problems to deal with. What they want is a method that quickly and reliably helps them understand the data.

Neural networks
NNs are software or hardware implementations of parallel, adaptive matrices that apply a mathematical threshold function to multiple, simultaneous inputs and return an output. Anyone who can read that description in one breath and can understand it may skip to the next section. For everyone else, let’s turn that prolix presentation into presentable English.

Figure 1
Figure 1. A neural network is composed of many nodes organized into multiple layers: an input layer, one or more processing layers, and an output layer. The connections among nodes and between layers grow selectively stronger and weaker, based on training inputs.
NNs are implemented in either software or hardware. The important thing to bear in mind is that an NN is a parallel-processing algorithm.One man can plow a field in nine days, so nine men working in parallel can plow a field in one day. Although many jokes come to mind to the contrary, this analogy is true for data analysis problems to which NNs could be applied.

An NN is not programmed; rather, it is trained by showing it a range of example inputs. The NN builds up a configuration based on the examples that allows it to function as an adaptive filter. This configuration is an internal representation or generalization of the examples, based on regression. The NN uses the generalization to match other input patterns against, within a range of tolerance. For example, an NN could be trained on a variety of data sets that represent a known sample. With enough training, the NN can recognize and identify data from unknown samples. The level of resolution varies with the threshold and the resolution of the training examples.

Each node in an NN sees a different component of the data set.The components may overlap. The different nodes make decisions based on their components. Then a consensus of these decisions results in a final, global decision, recognition, or identification.

The weighting is set up during training. In spectral analysis, for example, the NN is presented with samples of spectral data.The NN nodes record the inputs and sum them. When the sum of the inputs exceeds a given threshold value, each node generates an output that is propagated to other nodes. As the various training examples are presented, the paths between nodes are strengthened and weakened based on a training algorithm. The weakening and strengthening are achieved by assigning weights as predefined by a training algorithm. The weights represent the NN memory of learned patterns, much as coefficients represent a curve in a regression equation. The process is progressive, similar to the way a person might practice repetitively to learn to play a piece of music.

Equation (1) represents the mathematics of the behavior of a single node in an NN. Inputs X1 through Xn are multiplied by weights W1 through Wn for this specific node and then summed. The output, Xo, results when the weighted sum of the inputs exceeds the given threshold defined by a nonlinear transfer function. The combination of all the nodes for an NN results in the following nonlinear regression equation:

X0 = W1 x X1 + W2 x X2 + W3 x X3 + . . . + Wn x Xn (1)

NN nodes are connected in layers. For simplification, Figure 1 shows three layers, although more layers and other configurations are possible.

Figure 2
Figure 2. Neural networks are good at characterizing inputs, in this case a letter. The middle layer of the network compares the input layer to learned features, sending the result to the output layer, which results in identifying the letter.
The input layer is connected to the external world or the raw data. The middle layer consolidates the results from the input layer and propagates them to the output layer. For example, a computer recognizes the letter “R” by comparing the raw pixel input to component patterns in the middle layer, then matching the components to the letters in the output layer, as shown in Figure 2.

A rough analogy is the process of voting for the U.S. president. The populace is the data set, which casts its votes at the state level, analogously to the nodes in the input layer. TheElectoral College representatives for each state consolidate the information from the voters, similarly to the middle layer. And the collective Electoral College makes the final election, somewhat like the output layer. In practice, many more nodes are usually in the input layer than in the middle layer, and the number of nodes in the output layer usually corresponds to the number of possible outcomes.

An NN’s behaviors are not easily obtained with other computer technologies and algorithms. An NN is not programmed, and it does not have a predefined knowledge base. All that is needed is a set of training examples to learn and adapt from. The NN automatically modifies its internal structure to optimize its performance on the basis of these training examples. Because of their flexibility and adaptive training, NNs are used for a variety of tasks.

Training data set
An NN is trained by using a training pattern that consists of a pair of data. The pair is composed of an input and a corresponding output. The training set, used to build the NN model, is made up of a collection of characteristic training patterns. As a rule of thumb, the NN is trained for recognition by using a large set of training data that defines the range of interest. It is trained for extrapolation by using a training set that is about 75% as large as the recognition training set.

The NN is trained with the input, and it produces an output at each output node. The differences between the actual and desired output (from the training pattern), taken over the entire training set, are fed back through the network and used to modify the weights. This process iterates until a predefined threshold is reached. It is similar to the way a novice player practices, playing some notes, hitting the wrong key, hearing the difference between the desired tone and the played tone, and then correcting the mistake, or mistakes, iteratively.

Training considerations
To train an NN in pattern recognition, a data set is prepared with various examples that cover the range of interest. Three factors improve recognition:

  • The examples should be spaced at roughly equal intervals.
  • The examples should be presented to the NN in random order.
  • Unless the unknowns are perfect samples (such as those found only in textbooks!), the examples should include an equal mix of good samples and clearly identified, known, real-world samples. This type of training-set preparation enhances the NN’s ability to compare incoming data with data from similar, previously analyzed systems.

It is also important that the examples fill the sample space. If anNN is trained with samples in the concentration range of 0 to 50%, it is unlikely to give accurate estimates for samples whose concentrations are greater than 50%; that is, the network cannot extrapolate if it has not been trained to do so. And for the network to provide good interpolation, it needs to be trained with several samples covering the desired concentration range.

Some of the available tools allow the user to train, save the configuration, test the system, and then retrain to improve results. The researcher will achieve superior NN results by exploring these options.

Prediction
Prediction and trend extrapolation require a slightly different NN configuration than pattern recognition approaches. Although it is possible to train an NN to perform both types of analysis, results are more reliable if training focuses on one or the other.

This situation occurs with any complex statistical model. If a model is optimized for one activity and then used for another activity, it will frequently fail. If a hammer is used on a nail, then it works just fine. But if the same hammer is used on a screw, the results are less than optimal. Some might even say that the hammer is broken. It is not the fault of the hammer; it is simply an issue of using the wrong configuration of tools.

In a compelling demonstration of the power of NNs to extrapolate, Neil Gershenfeld and Andreas Weigend performed a data analysis study, supported by the Santa Fe Institute and the North Atlantic Treaty Organization, to find a good method of discovering and extending the patterns for various sets of erratic data.One of the best methods used a neural network to forecast the series and predict new behaviors not described by the data. The NN is a flexible tool, but it must be optimally trained to perform the required analysis. When fixed algorithms or rules do not exist for the large, complex data sets, a well-trained NN performs better than other methods.

Instrument drifts
NNs can be used to correct for instrument drifts. Instrument drifts result in a lack of long-term reproducibility when analyses are carried out a few months after the original NN or other multivariate calibration model was trained. If the researcher briefly retrains the NN with recent calibration data, then it will adjust its weights to account for the drift. This is an advantage of NNs—they do not have to be rebuilt from the beginning. If a previously trained NN is simply trained with more data, then it will adjust to the new system. Successive training can improve the recognition and prediction capabilities of the NN.

Conclusions
Chemists and biologists, who deal with large amounts of data, frequently rely on experience and intuition to act as a type of pattern recognition. Conventional statistical approaches may work for qualitative analysis, but quantitative results are limited. NNs are superior to conventional techniques, providing a method of quantitative analysis and an additional tool for extending intuition and experience. NNs are clearly superior to linear techniques, and they have the advantage of simplicity over other multivariate mathematical methods. NNs can carry out nonlinear mappings from the input to the output nodes, they can be used to correct for instrument drift, and they are robust in relation to noisy data.

Systems with high data output can be quickly analyzed using NNs, which may improve with use because they can learn from successive data sets. NNs provide a formidable analytical tool for processing large amounts of data from spectroscopic and other chemical analysis techniques. I hope that this brief overview will interest other workers in pursuing the analytical advantages afforded by NN techniques.

For more information
General Applications of Neural Networks in Chemistry and Chemical Engineering; www.emsl.pnl.gov.


Hank Simon has worked with artificial intelligence and knowledge discovery systems for 22 years. Send your comments or questions regarding this article to mdd@acs.org or the Editorial Office by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

Return to Top || Table of Contents