|
||||||||||
Neural networks to the rescue |
||||||||||
These parallel-processing algorithms help chemists understand the data.
Computers have tremendous usefulness in science and everyday life, with their unique ability to do exactly what you program them to do. However, scientific research often forges into the unknown, an unattainable realm for conventional computer programs. Various methods have been devised to allow computers to handle diversity. One of these methods takes advantage of structures called neural networks (NNs), which mirror the activity of the human brain. NNs are used primarily to help analyze data, and they can reveal useful patterns in large,complex data sets and extrapolate trends. If a mathematician had sufficiently large amounts of data and lots of time, he or she might be able to characterize the data by using a set of parallel processors to run complex statistical analysis routines. Some statisticians and mathematicians might consider this activity to be an interesting research problem in itself. Chemists, however, already have their interesting problems to deal with. What they want is a method that quickly and reliably helps them understand the data. Neural networks
An NN is not programmed; rather, it is trained by showing it a range of example inputs. The NN builds up a configuration based on the examples that allows it to function as an adaptive filter. This configuration is an internal representation or generalization of the examples, based on regression. The NN uses the generalization to match other input patterns against, within a range of tolerance. For example, an NN could be trained on a variety of data sets that represent a known sample. With enough training, the NN can recognize and identify data from unknown samples. The level of resolution varies with the threshold and the resolution of the training examples. Each node in an NN sees a different component of the data set.The components may overlap. The different nodes make decisions based on their components. Then a consensus of these decisions results in a final, global decision, recognition, or identification. The weighting is set up during training. In spectral analysis, for example, the NN is presented with samples of spectral data.The NN nodes record the inputs and sum them. When the sum of the inputs exceeds a given threshold value, each node generates an output that is propagated to other nodes. As the various training examples are presented, the paths between nodes are strengthened and weakened based on a training algorithm. The weakening and strengthening are achieved by assigning weights as predefined by a training algorithm. The weights represent the NN memory of learned patterns, much as coefficients represent a curve in a regression equation. The process is progressive, similar to the way a person might practice repetitively to learn to play a piece of music. Equation (1) represents the mathematics of the behavior of a single node in an NN. Inputs X1 through Xn are multiplied by weights W1 through Wn for this specific node and then summed. The output, Xo, results when the weighted sum of the inputs exceeds the given threshold defined by a nonlinear transfer function. The combination of all the nodes for an NN results in the following nonlinear regression equation: X0 = W1 x X1 + W2 x X2 + W3 x X3 + . . . + Wn x Xn (1) NN nodes are connected in layers. For simplification, Figure 1 shows three layers, although more layers and other configurations are possible.
A rough analogy is the process of voting for the U.S. president. The populace is the data set, which casts its votes at the state level, analogously to the nodes in the input layer. TheElectoral College representatives for each state consolidate the information from the voters, similarly to the middle layer. And the collective Electoral College makes the final election, somewhat like the output layer. In practice, many more nodes are usually in the input layer than in the middle layer, and the number of nodes in the output layer usually corresponds to the number of possible outcomes. An NNs behaviors are not easily obtained with other computer technologies and algorithms. An NN is not programmed, and it does not have a predefined knowledge base. All that is needed is a set of training examples to learn and adapt from. The NN automatically modifies its internal structure to optimize its performance on the basis of these training examples. Because of their flexibility and adaptive training, NNs are used for a variety of tasks. Training data set The NN is trained with the input, and it produces an output at each output node. The differences between the actual and desired output (from the training pattern), taken over the entire training set, are fed back through the network and used to modify the weights. This process iterates until a predefined threshold is reached. It is similar to the way a novice player practices, playing some notes, hitting the wrong key, hearing the difference between the desired tone and the played tone, and then correcting the mistake, or mistakes, iteratively. Training considerations
It is also important that the examples fill the sample space. If anNN is trained with samples in the concentration range of 0 to 50%, it is unlikely to give accurate estimates for samples whose concentrations are greater than 50%; that is, the network cannot extrapolate if it has not been trained to do so. And for the network to provide good interpolation, it needs to be trained with several samples covering the desired concentration range. Some of the available tools allow the user to train, save the configuration, test the system, and then retrain to improve results. The researcher will achieve superior NN results by exploring these options. Prediction This situation occurs with any complex statistical model. If a model is optimized for one activity and then used for another activity, it will frequently fail. If a hammer is used on a nail, then it works just fine. But if the same hammer is used on a screw, the results are less than optimal. Some might even say that the hammer is broken. It is not the fault of the hammer; it is simply an issue of using the wrong configuration of tools. In a compelling demonstration of the power of NNs to extrapolate, Neil Gershenfeld and Andreas Weigend performed a data analysis study, supported by the Santa Fe Institute and the North Atlantic Treaty Organization, to find a good method of discovering and extending the patterns for various sets of erratic data.One of the best methods used a neural network to forecast the series and predict new behaviors not described by the data. The NN is a flexible tool, but it must be optimally trained to perform the required analysis. When fixed algorithms or rules do not exist for the large, complex data sets, a well-trained NN performs better than other methods. Instrument drifts Conclusions Systems with high data output can be quickly analyzed using NNs, which may improve with use because they can learn from successive data sets. NNs provide a formidable analytical tool for processing large amounts of data from spectroscopic and other chemical analysis techniques. I hope that this brief overview will interest other workers in pursuing the analytical advantages afforded by NN techniques.
Hank Simon has worked with artificial intelligence and knowledge discovery systems for 22 years. Send your comments or questions regarding this article to mdd@acs.org or the Editorial Office by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036. |