About MDD - Subscription Info
February 2001
Vol. 4, No. 2, pp. 57–58, 60
sitea and software
Mean what you say
Opening ArtIntelligent speech recognition systems facilitate data exploitation.

It has been said that in a few years, scientists will need to be as comfortable using computers as they are now with pipets. Too true. The biotechnology and pharmaceutical industries are only just beginning to experience the new ways in which science will be conducted in the postgenomics era. It is already clear that one of the major challenges will be to make good decisions based on rapidly expanding volumes of data. But the new era heralds change: The increasing volume of data and information means that people will have to rely on computers to make important decisions without human intervention. There will not be a choice—either data will be analyzed intelligently by computer, or it will be left unexamined and unexploited.

Unfortunately, there are significant barriers to realizing this vision. Even now, most software yields results that are far from optimal. What is the best way to deal with these issues? The most important realization is that the problems lie not with people, but rather with computer software. Not only is current software not sufficiently powerful to meet future needs, but for most people it is simply too hard to use.

If we are to succeed in this new era of high-throughput discovery, we will need to provide a more effective means for people to interact with computers, and the computer systems will need to be more “intelligent”. Although it is rather difficult to define precisely what is meant by intelligence, it is relatively easy to define criteria by which the success of intelligent systems can be judged. Useful criteria include

  • letting people do more work in less time,
  • shielding people from overwhelming volumes of data, and
  • showing information that is useful in decision-making processes.

Recently, the Bioinformatics and Advanced Information Systems team at Cambridge Antibody Technology Group plc (CAT; Melbourn, U.K.) has pioneered both the development and implementation of these “next-generation” computer systems. The aim is to put powerful, user-friendly systems in the hands of scientists. One aspect of these new systems is the entry of data into computers using the spoken voice.

Vox populi
For the human voice to be a genuinely useful mechanism of data entry, two interrelated but different technologies—speech recognition and natural language understanding—are necessary. Speech recognition software allows a computer to convert the spoken voice into a more useful representation in the computer. For example, if the spoken voice is converted into text, then documents can be created easily. If speech recognition alone is used, however, no meaning is attached to what has been said. This places significant limitations on the ways data can be automatically exploited. Natural language understanding software takes off where speech recognition ends and attempts to attach meaning to the spoken or written word.

Speech recognition technologies have made great strides in the last five years, with the shrink-wrapped applications ViaVoice (1) and DragonNaturally Speaking (2) dominating the market. Speech technologies tend to follow similar modes of action comprising the following operations:

  • digital speech sampling from microphone input;
  • acoustic signal processing;
  • phoneme, phoneme sequence, and word recognition; and
  • word-level analysis and recognition grammar checking.

Recognition rates of 98– 99% are achievable using high-quality microphones. Creating medical and legal documents has been a popular recent application of speech recognition technology.

In scientific discovery research applications, speech recognition is particularly useful in cases in which users do not have hands available to operate a keyboard or eyes free to look at a monitor. For example, CAT uses speech recognition in cellular pathology so that scientists can concentrate on the interpretation of arrays of tissue sections mounted on microscope slides and “say what they see”. Additionally, by freeing scientists from the restrictive and time-consuming requirement of filling out detailed forms, and instead creating unstructured natural language documents, productivity and the potential value of data are substantially increased. It does, however, create problems in searching for information in unstructured text. Enter natural language understanding systems.

Naturally speaking
Natural language understanding is a problem that is far from being solved. Over the past two decades, much effort has been put into an approach called semantic/lexical analysis or parsing. In this approach, rules of grammar are applied to text to understand textual information explicitly. However, it often fails catastrophically because the algorithms tend to be based on decision trees. One incorrect true/false decision made early on can lead to failure of the entire analysis. These shortcomings are sufficiently severe that this approach is rarely used in real-world production applications.

Simpler approaches based on keywords are more popular. Unfortunately, keyword-based methods tend not to be helpful in cases in which a large amount of information needs to be searched. Searches are highly sensitive to small changes in the formulation of a query, the output is not always ranked in order of relevancy, results may be misleading because important results are missed, and large numbers of false positive results may be returned.

More recently, artificial intelligence (AI) approaches, based on statistical word patterns that assign concepts to documents, have started to become popular. The CAT team has developed its own AI approach, inspired by the field of cognitive neuroscience. The key benefit of the approach is that natural language can be coupled with a detailed representation of knowledge that can be stored and rapidly searched by users. The CAT approach is based on a model of attentional processing. Input streams of data in the form of naturally spoken language simultaneously provide stimuli to a variety of so-called language analyzers.

Language analyzers fall into two categories, domain-neutral and domain-specific. Thus, the analysis of data may be independent of predefined knowledge about the specific content of a document, or it may make use of a detailed understanding of a particular field such as cellular pathology. According to the nature of the stimuli (e.g., words, patterns of words), the attention of given language analyzers is raised, which in turn determines the overall response of the system. The response of the system is a description of the knowledge contained in the input data. It is important to understand that the idea of searching is central to dealing with large volumes of information. Thus, the kinds of searches that can be performed are determined by the information extracted from the input data, allowing complex queries to be made and detailed visualizations of information to be displayed.

A schematic illustration of the integrated CAT system linking speech recognition, natural language understanding, and knowledge representation is shown in Figure 1. The system has been applied in production systems that enable interpretation of the results from automated high-throughput immunohistochemistry experiments. Using this approach, CAT cellular pathologists can fully interpret patterns of protein expression at the cellular level for approximately 250,000 tissue sections per year.

Speech recognition allows rapid data input while natural language understanding converts the raw data into useful information. Immunohistochemistry data can be queried by end users (3). The Complex Query Builder in the window illustrates how the kinds of queries are determined by the knowledge representation. In this case, the concept of staining intensity was identified in the input data, and thus the user is permitted to search for profiles in which staining was of at least moderate intensity. Note that the behavior will be different from, and more valuable than, a keyword search. That is, the knowledge representation-based search means that profiles with strong intensity are correctly returned. A keyword search of the text simply using the keyword “moderate” would ignore profiles in which strong staining was observed.

Speech recognition and natural language understanding systems are set to play increasing roles in the molecular sciences. Perhaps about 70% of the data used by the biotech and pharmaceutical industries to make key decisions is unstructured (e.g., internal reports and the scientific literature). Developing natural language understanding systems, in particular, is therefore expected to be especially important.

References

  1. ViaVoice, IBM. www.ibm.com/software/speech.
  2. Dragon NaturallySpeaking, Lernout & Hauspie. www.dragonsys.com.
  3. Brocklehurst, S. M.; Hardman, C. H.; Johnston, S. J. Pharmainformatics: A Trends Guide. Mol. Med. Today, Trends Supplement 1999, 5, S12–S15.

(Web sites accessed February 2001.)


Simon M. Brocklehurst is the head of Bioinformatics and Advanced IS at Cambridge Antibody Technology Group plc. Send your comments or questions regarding this article to mdd@acs.org or the Editorial Office by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

Return to Top || Table of Contents

 CASChemPortChemCenterPubs Page