Chemical & Engineering News,
March 27, 1995

Copyright © 1995 by the American Chemical Society.

Chemical Research Faces Opportunities, Challenges From Information Tools

James H. Krieger,

C&EN Washington

If the American Chemical Society were being founded today, its logo more than likely would not partner the phoenix with a Liebig tube as an apparatus emblematic of chemistry. What would be used to symbolically convey the image is anyone's guess, but a computer would not be out of the question.

The application of computers is fast becoming central to the doing of chemistry. As chemistry has become "wired" over the years, its methods have undergone sometimes slow, sometimes rapid change. Today, that activity has accumulated to a point where it is on the verge of not only altering the doing of chemistry but transforming the conduct of research and the way chemists and other researchers interact with each other and with the world outside their labs.

Influenced by the current state of affairs in information generation, acquisition, movement, and consumption, chemistry is indeed entering an era of revolutionary change, an information revolution. Although the revolution is broadly encompassing, its focus is the researcher's desktop.

At one level of activity, the apparatus of the chemist is more and more becoming dominated by electronic tools and the application and dissemination of the electronic information those tools generate. At another level, the recent eruptive increase in use of the Internet and the even greater promise of its potential are adding to the growing cocoon of electronic intelligence surrounding the science.

"Now you've got tools that are quite phenomenal," says Robert J. Massie, who as director of ACS's Chemical Abstracts Service (CAS) Division, Columbus, Ohio, is at the supply hub of chemical information activities. Many of the tools becoming available, he observes, are increasingly making it possible for individuals to do research, perform tasks, and run computations at speeds that in the past would not have been possible.

"It's really an exciting time to be in this business," says Steven Goldby, president and chief executive officer of MDL Information Systems Inc., San Leandro, Calif., and another participant and observer of the information technology scene.

For practitioners of chemical research and development, the information revolution represents a turning point. It translates into specifics such as how new approaches to the chemistry done on the desktop are accomplished. An era is approaching when computational chemistry and molecular modeling can become available to almost every chemist as a common research tool.

At the same time, there are on that desktop new, simpler, and quicker ways of accessing and managing information, both in-house through expanding company databases and from external commercial resources as well. With the emergence of on-line electronic conferences - not to mention electronic publishing and electronic bulletin boards - there are new ways for chemists to share their results.

Moreover, for many research chemists, the changes being wrought by the information revolution are embodied not only in new electronic practices, such as keeping laboratory notebooks, but also in new ways of working. More and more, research is being carried out by interdisciplinary teams of research scientists, often requiring the ability to work on-line simultaneously. They may be at one geographic location within an organization or at the organization's sites around the world. Indeed, they may be at many different organizations. But they are all now or soon will be at their desktops.

The single common thread running through today's information activity, says Massie, is the increasing personalization of information. Individuals are going to have a lot more ability to powerfully conduct research and to investigate prior knowledge, he adds. "All of this suggests a potentially profound increase in personal productivity, as individuals leverage their own skills and efforts."

Massie: Increase in personal productivity

However, whether because of a lack of interest or shortage of money, not every sphere of chemistry appears to be rushing into the electronic future at the same rate or with the same enthusiasm. Indeed, disorderly might be an apt description of the migration.

For example, as one onlooker, Stephen R. Heller, a research scientist in the Agricultural Research Service's plant genome program, observes of chemistry on the Internet, "The Internet is like bingo. You never know what's next and if it's going to be useful."

Heller is consulting editor for the ACS Professional Reference Books focused on computer applications in chemistry. From his overall perspective, he sees chemistry lagging behind its electronic potential. Some areas, he says, are quite clearly light-years ahead of other areas.

"There's just a lot of unevenness to it," Heller adds. "The pharmaceutical industry is gung ho. And they're doing all sorts of two-dimensional and three-dimensional modeling. They're doing all kinds of property estimations." But then, with organic chemists, with synthetic chemistry, usage is at a very low level.

In short, says Heller, "There are areas of chemistry where [computer usage is] pushing forward, and there are areas that seem to have no known use of computers."

But Heller suspects that chemistry might follow in biology's wake. "The biology people," he notes, "because of the Human Genome Project and the volume of information that's processed through it, are in fact being pushed much more into computer literacy than any other science." To some degree, that's what's going to happen in chemistry, he conjectures. "You're going to come up with something. And the only way to handle it is with a computer system. Because there's just going to be too much information, too much data."

Getting to the literature

Some chemists not already engaged by the new information technology may well be looking forward to the experience with indifference. But many others could be forgiven a degree of giddiness as they prepare to ride the crest of the information technology wave rolling toward their desktops.

Typical among the items arriving there to empower chemists with broad new capabilities are two new products that are bidding to revolutionize the way those chemists and other scientists access and use the literature. Moreover, those products are revolutionary in another way: They are overturning the traditional approach to pricing and paying for information.

SciFinder and KR ScienceBase are these products. SciFinder - whose development was announced last fall by CAS (C&EN, Oct. 31, 1994, page 19) - will be shipping by the end of this month. KR ScienceBase - a joint project of Helix Systems and Knight-Ridder Information Inc., both in Mountain View, Calif. - will make its debut next week at ACS's national meeting in Anaheim, Calif., with release planned for the third quarter of this year.

Both products are designed to give researchers immediate access to information at the desktop. They put information searching at scientists' fingertips, simply and without any need for training.

"And that's a pretty fundamental change in the way information is delivered in this arena," says Katherine A. Kinsman, vice president of Helix Systems. "Most folks, except for some fairly sophisticated end-user scientists, depend on information professionals or agencies to do that for them. And the time lag typically prevents them from even asking questions in the beginning or certainly slows down their research, even for simple questions."

Said Massie in announcing SciFinder: "We believe SciFinder will change the way chemists conduct research. It's as simple as that."

CAS developed and maintains the CAS Registry system that identifies and records chemical substances reported in the literature. It is a major partner in the operation of STN International, the international on-line network of more than 180 scientific and technical databases. It publishes Chemical Abstracts (CA) and other print products. And it provides eight on-line databases, among them CA, a bibliographic file with access to the abstracting and indexing information that has been published in the printed CA since 1967.

SciFinder is a point-and-click conduit to CAS databases. It enables a user to retrieve information from the CAS Registry of Chemical Substances - now more than 13 million and growing rapidly - and from CA databases, with references to more than 11 million articles and patents.

The new CAS product guides users through three types of functions: "Explore," "Browse," and "Keep Me Posted." Explore lets a user examine the databases by author, research topic, or chemical substance. Browse provides the tables of contents of more than 1,300 scientific periodicals. And Keep Me Posted lets users define keywords, authors, or CAS Registry Numbers important to the user and alerts them to recent arrivals.

Moreover, SciFinder is designed so that it is immediately usable and doesn't require special training. For example, if a searcher doesn't know the exact spelling of a name intended for searching, or gets it wrong, SciFinder suggests several alternative spellings and provides a list of possible authors. It also has a document analyzer feature, which automatically evaluates a document for information specified by the user.

In a typical application, a searcher might begin with a concept - the effects of caffeine on the metabolism of pregnant women, for example. After an initial exploration through the database, SciFinder will return bibliographic information and abstracts.

"What we've built into SciFinder," says Brian P. Cannan, manager of client-server product marketing at CAS, "is a tie to the full images of the articles." Microscope and open book icons appear. "If searchers click on the microscope," Cannan explains, "they can get substance information or detailed indexing information about that particular citation. But if they click on the book, then the image of the full document will appear."

Initially, SciFinder is making the ACS journals available in this way - in full-image format - meaning charts, tables, and graphs will be included. Laser-print quality printouts can be made.

With SciFinder, a user can explore background information by research topic simply by keying in a phrase describing the topic; if exploring by author, SciFinder can list last names with different spellings or with different first names.

With KR ScienceBase (its name during development was KR LabSearch, as seen above), a user can search the drug toxicity and adverse effects literature simply by keying in the drug name; KR ScienceBase provides a list of hits that can be scrolled through for previewing

KR ScienceBase brings together the on-line information resources of Knight-Ridder Information and the desktop Windows environment provided by Helix Systems. Founded in 1991, Helix Systems has developed ResearchStation, its flagship product that equips scientists with an electronic laboratory notebook, providing them a work space for managing their information while collaborating with colleagues on their research projects. Knight-Ridder Information is the new corporate name, since Jan. 1, of Dialog Information Services, although Dialog is being retained as the name of the company's on-line service. The Dialog service contains more than 450 databases across many business, news, scientific, and technical areas.

Running as a module within ResearchStation, KR ScienceBase will give users desktop access to published literature from a number of scientific databases - about 20, at present - included in the Dialog collection. Among them are the Agrochemicals Handbook, Analytical Abstracts, Biosis Previews, CA Search, Chemical Engineering & Biotechnology Abstracts, Chemical Safety NewsBase, Chemsearch, Claims/U.S. Patent Abstracts, Medline, the Merck Index Online, TSCA Chemical Substances Inventory, Toxline, and U.S. Patents Fulltext.

Like SciFinder, KR ScienceBase takes a point-and-click approach. It provides users with what are called query templates, screens that present a set of questions to be answered to help define the search. For example, there is a query template for adverse effects of drugs. A user would specify a drug - Tylenol, say - and click on any other parameters of interest. The user might want to know only about children or might want to search literature only for the past year.

Built-in search intelligence then goes to work in the background, explains Lisa Riland, marketing communications manager for Knight-Ridder Information. Search categories have been defined that essentially include the search strategy and specify the databases most appropriate for the search. "It's hard to figure out which database you need to go into if you're not familiar with them," Riland explains. "This way, we've picked the databases that are most appropriate for your question, so you don't have to mess with that."

For distribution, KR ScienceBase will most likely come separately packaged with its own documentation in every ResearchStation box. Users would then be able to install the software and decide whether they want to subscribe to the KR ScienceBase product.

When it comes to pricing, both SciFinder and KR ScienceBase are charting new territories. Both are offering subscription approaches.

"We're trying to establish a new paradigm," Cannan notes in regard to SciFinder. "Typically, search pricing has been on a connect-hour basis. There's a little clock going off in the background, or a meter. So, you don't often ask the question that you really want. You tend to go in and come out in short bursts," he explains.

Consequently, SciFinder pricing is designed to remove the connect-hour and search-term meters. Customers will be offered two options - task pricing or subscription pricing. "We recommend the subscription plan in most cases," Cannan says, "as it truly removes the meters and enables this tool to be blended into the normal work environment."

SciFinder is an individual scientist product, Cannan explains, and users are not allowed to share log-in identifications. The subscription price for an installation of 20 scientists will be about $60,000 annually and provides access to the database and software. As the number of scientists within an organization using SciFinder increases, the price per scientist will get lower. Journal images from publishers will be priced separately. At present, only ACS journal images are available, at $15 per document.

Task prices for SciFinder will range from $10 to $40 per task, depending on the job requested. For example, structure search tasks cost more than browsing journals' tables of contents or looking for papers published by a specific author.

Similar thinking is going into the pricing of KR ScienceBase. The specifics are still under consideration, Riland says, but pricing will involve a subscription fee plus a cost for output. This approach also is a change from the way Dialog is structured now, which is that charges are determined by connect time. With KR ScienceBase, "you can browse, you can search as much as you want," Riland says, "and then when you find what you need, that's what you pay for."

Kinsman adds that an important aspect of providing users with more desktop tools is to give both individuals and the organizations they work for some level of predictability about the cost. Helix and Knight-Ridder are looking at a monthly subscription service, perhaps also an annual one. But various other considerations likely will enter into the pricing, Kinsman explains. If, for example, a user were making a patent search, as opposed to just an adverse-effects search, the cost of bringing down a patent would be significantly different from that of bringing down a journal article.

Technology spurs development

Revolutionary as they are on their own, SciFinder and KR ScienceBase are just two among the very latest of an expanding array of empowering new offerings being placed in chemists' hands. These developments are being propelled by technological and business influences. But certainly the rapid growth in computer power that has been taking place is one of the chief technological forces that has been shaping and will continue to shape the future of the desktop.

Computer hardware developments seem to occur so fast that it is difficult for the outsider to keep up. The significant result is that price/performance ratios have dropped to a point where it is conceivable that every chemist's desktop will have affordable levels of supercomputer power.

Until recently, supercomputing was essentially the province of specialists. But increasingly, the whole idea of supercomputers as something apart has been diminishing. Today, supercomputers are viewed more as just the high-performance end of the conventional computing spectrum. And that end has been getting less expensive.

A case in point is the Power Indigo2 from Silicon Graphics Inc. (SGI), Mountain View, Calif. Introduced last October, the desktop workstation combines the central processing unit performance of the company's R8000 microprocessor (300 million double-precision floating-point operations per second - MFLOPS) with graphics performance (256 MFLOPS) at levels that enable users to work interactively and visually with the large data sets used in scientific research and engineering. The price tag for such capability starts as low as $46,000.

Another case in point comes from IBM, White Plains, N.Y. Last month, the company announced hardware and software enhancements to its POWER Parallel Systems SP2 that it says can improve price/performance by more than 15% for numeric-intensive scientific and technical applications such as drug design. The SP2, initially introduced almost a year ago, is IBM's general-purpose, high-performance parallel processing computer. Parallel processing on the SP2 links together from two to 512 IBM RISC System/6000 processors that work on different parts of a problem at the same time.

The new Thin Node 2 processor announced in February now becomes the entry-level processor for the SP2. It fits into an SP2 cabinet and, according to the company, provides the number-crunching performance of currently used, larger Wide Node processors at a lower price. The reduction brings the lowest price for an SP2 to less than $138,000.

A third case in point comes from Digital Equipment Corp., Maynard, Mass., which last July brought out new DEC 3000 workstation Models 700 and 900 AXP. Both are powered by Digital's 64-bit Alpha microprocessors, and they enable users to run complex technical, business, and scientific applications from their workstations with quick response and fast turnaround, the company says. Starting prices for the workstations are $27,698 for Model 700 and $43,373 for Model 900.

As chemistry market manager for SGI, Mark E. Berger looks at chemistry applications from the position of both computer maker and chemistry user. Speed and scalability, he says, are the two major considerations.

The more common lower levels of power, Berger points out, just do not provide the time savings now often demanded by molecular modeling research or database searching. In the academic world, he explains, longer searching time may be more acceptable because the cost of labor is relatively low. But on the commercial side, the cost of labor is high, and the race to be the first to market with a new drug or chemical is significant to a company's success.

The amounts of time that can be saved are striking. Berger notes that in a comparison of recent benchmarking tests with Gaussian 92/DFT, an ab initio computational chemistry program, a 90-MHz Pentium chip took 600 minutes - 10 hours - and the SGI R8000 chip took only 16 minutes.

Scalability is the other factor of importance. In SGI's approach, Berger says, the company's technology allows users to add servers and workstations as the need for computing and for searching and accessing information grows. Consequently, he says, SGI has worked with the developers of major software systems for molecular modeling and database searching to tune, or "parallelize," the software to take advantage of the parallel processing possible.

Dennis J. Gerson, consulting scientist for POWER Parallel Systems at IBM, says that in the scientific area, the company views the SP2 as the high end of the RISC technology family. So from that standpoint, it must follow the same price/performance curve downward.

The SP2, Gerson says, can be considered a local area network (LAN) of RISC systems. But it has a high-performance communication backbone that lets it operate in a parallel format. The designers, he adds, settled on a backbone-and-switch design because it enabled any processor to talk with any other processor as though it were an immediate neighbor.

Gerson points out that most computational chemistry codes - the software - run on the RISC System/6000. That means that they will run on the SP2. Relational databases run well on it, he adds, making it a good database machine.

An example of what can be done with the power levels available in high-performance computing - although still an activity of the specialist - comes from work by SGI molecular graphics experts in the company's Basel, Switzerland, office. They make use of texture mapping, a tool for color coding a molecule's surface properties to combine multiple channels of information into one display. Properties of a polypeptide, for example, can be selectively filtered or clipped to display electrostatic potential or molecular lipophilic potential. In another application, slicing volumetric properties in real time by means of a slice plane indicates water distribution around glucose. Real-time volume rendering using multiple slice planes shows the water density around glucose.

The evolution of high-performance computing to its present state is only one of the technological developments that is putting increasingly greater computer power into the hands of the individual chemist. Another is the now widespread use of client-server systems. In this case, a larger computer acts as a server, holding databases or sizable computational programs that are accessed or run from desktops.

Networking is another enabling development. Users are more and more able to transparently access different computer platforms networked together, with the illusion that the network is one large computational resource and not a collection of PCs, Macintoshes, and UNIX workstations.

Still another powerful technological influence, despite its bingoish nature, is the flowering of the Internet as a communications and information conduit. Although the Internet has been around since the late 1960s as a way for scientists to transfer files between computers, it is only recently that it has evolved into the beginnings of a global hypermedia information system.

The Internet makes use of the Transmission Control Protocol/Internet Protocol (TCP/IP), which enables the system to transmit and share data stored throughout the system. As the Internet came to be seen less as a network of computers and more as an information space, users piled on. Electronic-mail and file-transfer operations grew. Then along came Gopher, a client-server system developed at the University of Minnesota Computer Center that presented users with a hierarchy of servers maintained by parties in the system and containing files of information that could be retrieved by users.

But it was with World Wide Web (WWW) that the Internet blossomed. WWW is a client-server system developed by the European Laboratory for Particle Physics (CERN) in Geneva, and released for use a few years ago. WWW made it possible for computers on the network to share a single copy of a file rather than having to depend on multiple copies stored in multiple locations. A browser, such as the Windows application Mosaic, developed by the National Center for Supercomputing Applications at the University of Illinois, Urbana-Champaign, enables a user to maneuver around the web.

In a web document, a word, phrase, or icon can be highlighted in a special way. Clicking a mouse on the item activates hypertext coding embedded in the document, directing the browser to move to some other point in the document or to call up another document entirely. In this way, the Internet becomes an information space.

With high-performance levels of computing power, slicing volumetric properties in real time by means of a slice plane can be used to indicate water distribution around glucose (above). Using multiple slice planes for real-time volume rendering indicates the water density around glucose (below).

A new research climate

The fast-moving hardware and networking developments on the technological front have combined with recent business imperatives to create the new research climate. One result is that hardware and networking developments, coupled with those in software, make it possible to constitute virtual laboratories as researchers collaborate interactively from their desktops.

"Large multinational chemical and pharmaceutical companies doing basic research are addressing organizational issues and reorganizing, from being vertically oriented to taking the team approach to research," SGI's Berger observes. This team of specialists is able to get a product to market quicker, he notes.

The project-team focus, in turn, generates a need for tools that enable team members to work together. The computer, says Peter Gund, vice president of research at Molecular Simulations Inc., Burlington, Mass., a molecular modeling software firm, is a way for the different disciplines to share their results and further the work of the project team. "The fact that we have the same software, the same user interface, servicing the different disciplines," he says, "really makes it easier for these disciplines to communicate their results."

Drug discovery provides what is perhaps a typical model of an interdisciplinary research team. Such a team, Gund explains, might include an X-ray crystallographer for structure determination; a protein NMR spectroscopist; a biochemist doing homology modeling of proteins or protein design; a computational chemist trying to deal with structural data, hypotheses, and small-molecule structures; and a medicinal chemist trying to use all of this information to design new compounds.

A wide range of software products is emerging to help team members work together at different levels of interaction or with different styles of collaboration. For example, software developers are providing various packages for collaborative work as well as electronic laboratory notebooks and project management systems.

The AT&T Vistium Personal Video 1200, introduced last year by AT&T Global Information Solutions, Dayton, Ohio, is one example of a videoconferencing system that lets users in different locations see, hear, and work with each other. The system provides Windows-based PC users with face-to-face communication through TV cameras mounted on the computers and collaborative applications such as the ability to share images on an area of the screen accessible simultaneously to all, called a whiteboard.

A slate of new SGI products provides a good example of what is now possible in collaborative research and points to what may be coming in this fast-moving arena. The company has had a desktop videoconferencing system for its UNIX workstations called InPerson. Just two weeks ago it introduced version 2.0. And last November, it announced that it had licensed InPerson to NetManage, a Detroit-based company that develops and markets an integrated set of TCP/ IP tools for the Windows environment. NetManage will develop an interoperability solution that will allow InPerson users to communicate with PCs running Windows.

InPerson enables people to interact with live video and audio while working together in real time on a selected file, a captured image, or a text document. Numerous people can be included in a conference call. The software provides a window - what SGI calls a shared shelf - where files can be quickly distributed to participants by a user dragging and dropping the file icons onto the shared shelf. There is also a whiteboard, where participants can work together - for example, marking up a file or image collectively in real time - with a variety of markup tools, including automatically assigned personal cursors that uniquely identify the participants.

The new version 2.0 of InPerson has numerous improvements. Among them are a multipage whiteboard and a single application that supports both integrated video-in and the whiteboard controls, with no need for add-ons. The whiteboard also supports three-dimensional models - which for chemistry may be one of its more compelling features - enabling all participants to view, scale, and spin models that have been cut and pasted onto the whiteboard.

A related product introduced by SGI last November is Iris Annotator, a new multimedia application for SGI workstations expressly designed for sharing ideas and information about 3-D models. Annotator allows a user to attach digital media annotations directly to the model so that users can explore the model while reviewing the annotations. Annotations can be created from audio, images, text, video or even other 3-D models. Recipients of annotated models assembled in the system can easily view the model and its annotations.

Chemistry stands to benefit from these tools, says SGI's Berger. For example, a target molecule can be shared on a whiteboard and changes can be made simultaneously. With Annotator, he points out, a user can call up a 3-D molecule and use arrows to point to areas of discussion. The information can then be included in an e-mail message or brought on-line to a whiteboard.

In addition, a program called MovieMaker provides for the viewing of simulations. For example, users modeling 3-D molecules can study the effects of change from a heating and cooling cycle as they study molecular dynamics simulations, starting and stopping the information as with a VCR.

And WWW adds an easy information dissemination tool. In large companies, Berger explains, the web can be used internally to publish real-time results. Managers can surf the web internally accessing status reports, research graphics, and other information.

These technologies are already benefitting major automotive manufacturers, Berger says. The next step is to make these tools chemistry smart. "The technology is moving incredibly fast," he adds, "and many of these tools will be chemistry smart within the year."

At another level of collaboration are electronic laboratory notebooks. These, too, provide benefits for chemical research. But for some chemical laboratories, the arrival of this technology is also posing problems of assimilation.

Helix's ResearchStation is one of a number of initiatives aimed in this direction. The product is marketed by both Helix and Megalon, a Swiss-based company with a U.S. arm in Novato, Calif. The arrangement results from an investment Megalon made in Helix during ResearchStation's development. Helix now is marketing ResearchStation to larger corporate accounts, while Megalon is distributing the product to the individual user market. Megalon views ResearchStation as a core product that can run the vertical programs that it has been and will be publishing.

Designed for the scientific market in general, ResearchStation employs the OLE-2.0 object linking and embedding technology of Windows to provide a dynamic electronic work space for consolidating, manipulating, and examining all available information about a research project or topic. It enables the researcher to gather, combine, and use information in any form, whether text, images, numerical data, graphics, or video and sound.

ResearchStation operates much like a paper laboratory log or notebook. It provides a framework for collecting and managing information from various sources with features such as Minder, which provides tools for automating routine and repeated tasks. And it creates an audit trail that allows information to be organized on the basis of content, history, or relationships to other information. These features can be supplemented with specialized procedures for security and standards requirements.

A network version of ResearchStation was released last month. But even before that, Helix's Kinsman says, several large companies had initiated pilots. The most rapid acceptance, she says, has been among chemists, as Helix had expected. But, she adds, a good deal of interest has been coming from analytical laboratories and medical laboratories.

Some companies are further along than others in their efforts to create an electronic record-keeping system in research and development, according to Kinsman. Those that are, she says, have made significant commitments to embrace new technologies. But the greatest impediment continues to be cultural issues associated with electronic records such as privacy, ownership, and authentication.

Kinsman says that, in her experience, companies are pursuing two paths in implementing electronic record keeping. One is a strategy with top-level support, employing work groups organized to evaluate processes and technologies across the company. The second is a grassroots path adopted by individual groups trying to solve specific information or work flow problems. "We think both pathways are ultimately required to be successful," Kinsman says, "but the impetus for change is coming from different places, depending upon the organization."

A new initiative in electronic laboratory notebooks is coming from Tripos Inc., St. Louis, a molecular modeling software firm. In February, the company announced a strategic alliance with the ForeFront Group Inc., Houston. The object of the alliance, the companies state, is to create a new generation of electronic laboratory notebooks.

The new system is intended to seamlessly integrate Tripos' chemical modeling and information management tools and ForeFront's existing electronic Virtual Notebook System to provide an on-line information repository for researchers to secure and share intellectual property across PC, Macintosh, and UNIX platforms. One aim is to enable researchers to automate manual tasks through features like electronic signature, date/ time stamping, auditing/ journaling, and page locking, all of which have been developed for regulated environments needing to secure intellectual property.

Data explosion

While many of the new developments in information handling result from the push of system technology and the pull of business needs, the science of chemistry itself currently is providing a powerful spark of its own. Perhaps the most monumental information management challenge yet presented by the science is that of managing the information streaming from experiments in the relatively new area of combinatorial chemistry.

MDL's Goldby sums up the predicament for information management professionals: "After all, you can't really go to a scientist and say, 'Well, we're unable to represent this, so you had better not think about making these kinds of chemicals.'"

MDL is one of a number of companies that have been working to remedy the situation. In January, it announced its first product for this area and expects to begin delivery in the second quarter of this year. Tripos is another company with a product in this area. Late last year, it introduced its Molecular Diversity Manager designed to deal with problems of mass screening.

Combinatorial chemistry, currently the hottest craze in drug discovery, greatly increases the range of molecular diversity available to the medicinal chemist. The novel synthesis methods employed create libraries of as many as millions of structural variations of compounds that then can be screened for biological activity. The result has been aptly termed a data explosion (C&EN, Feb. 7, 1994, page 20).

Throughout the history of chemistry, Goldby explains, chemists have generally synthesized compounds one at a time. So there has been an evolutionary, incremental growth in chemical databases. Even with recent reengineering of its searching technology for its 2-D and 3-D search systems, Goldby says, MDL was addressing the kind of performance people have been wanting: to search in-house databases, which, at hundreds of thousands to a few million compounds, were considered to be large.

"Those kinds of incremental developments were fine, until combinatorial chemistry came along," Goldby says. With some of the experiments that have been run, for example, 100 million compounds can be generated in a single experiment. "It's clear that with combinatorial chemistry, an evolutionary approach is not going to be able to keep up," he adds.

Project Library is MDL's first product for combinatorial chemistry. It is the fruition, Goldby says, of work aimed at developing new ways to represent and search large collections of chemical libraries - being able to search very quickly a library of, say, 107 or 108 chemical members.

Project Library is desktop software designed for use by project teams. It can build, store, and archive combinatorial libraries using both generic structures and specific features. The idea is that structures can be economically and efficiently stored as libraries but retrieved as specific structures when needed. A product designed for a central repository or companywide solution will be introduced toward the end of this year, Goldby says.

Tripos' Molecular Diversity Manager is the first of a family of products from that company that also will deal with combinatorial libraries. The technology helps to create the diverse structures necessary for structure libraries and to manipulate the data for screening, according to Michael J. Sullivan, marketing communications and desktop product manager for Tripos.

Another product from Tripos, designed to deal with a related phase of the problem, is also now available, Sullivan says. Called BioSTAR, it is aimed at biological sample tracking and reporting. The system links into the mass screening tools, he explains, coupling what's happening in the laboratory with the assays that must be done.

"There's just so much data out there," Sullivan says. "And it's mining that data that becomes the real issue."

Combinatorial chemistry is also casting a shadow on commercial databases, a shadow that is being noticed by MDL. In addition to supplying chemical information management software, MDL is in the database business, with a focus on reference databases that fill unmet needs enhanced with special access systems.

One of MDL's databases, for example, is its Available Chemicals Directory, a compendium of chemical products offered by more than 200 chemical suppliers. In January, the company announced that 18,000 new compounds from Sigma-Aldrich Corp.'s Library of Rare Chemicals had been added to the directory, which totals some 415,000 compounds.

With the advent of combinatorial chemistry, Goldby says, the Available Chemicals Directory went from being simply a logistic support database to something that is strategically important and time-dependent for users now that high-volume screening is enabling people to run through not 30,000 samples a month but 30,000 samples a day. "The knowledge of what chemicals are available, either for screening or for building blocks, has really made that database a very, very important product to our customers," he says.

Another of MDL's databases is a metabolite database. Next quarter, the company will release a specialized browsing application for the database that Goldby says will make that information much more readily available. That availability becomes important, he says, because as combinatorial chemistry enables people to generate and optimize leads faster, the next bottleneck is going to be the decision of which candidates to take forward into development. Metabolism data can be important for that.

It all adds up to what Goldby sees as a change in certain aspects of research, from its being an artistic, scientific endeavor to being an industrialized process focused on throughput. "We have customers," he says, "that are ordering 50,000, 60,000, 70,000 chemicals at a time now. And fine chemical suppliers are willing to supply them in 96-well plates, so they're delivered ready to go into high-throughput screening robots."

Meanwhile, databases in general are proliferating and their deployment to the desktop is widening. Just last month, for example, the Committee on Institutional Cooperation (CIC) announced that it is deploying the CrossFire client-server system from Frankfurt-based Beilstein Information Systems on its Virtual Electronic Library (VEL). CIC, which inaugurated VEL early this month, is a consortium of major research and teaching universities in the Midwest - essentially the Big 10 and the University of Chicago.

The CrossFire deployment is making available to members Beilstein's electronic database with more than 6.5 million organic chemical structures and their physical and chemical properties and literature references. Students, faculty, and staff at CIC universities, along with those at three partner libraries at Wayne State University, Detroit; Iowa State University, Ames; and the University of Cincinnati will be able to search the database server, which will be located at the University of Wisconsin, Madison.

Students using the library at the University of Wisconsin, Madison, one of the schools to benefit from the Virtual Electronic Library linking major midwestern university libraries.

The Beilstein database is only one of many resources that will ultimately be available as part of the VEL project. CIC, through the electronic library, has undertaken the job of linking CIC university on-line public access catalogs into one virtual catalog that will eventually total more than 57 million items.

The CIC deployment is the latest feather in the cap of Beilstein, which only announced availability of the structures component of the database at the ACS meeting in San Diego just a year ago. At the ACS meeting in Washington, D.C., last fall, the company released what it calls Release 2 of CrossFire with its 12 gigabytes of structures plus data.

At that time, Beilstein had some large installations in Europe and some interest in the U.S. Release 2 was a fuse. "That truly is what customers were looking for," says Jorge Manrique, group manager of marketing for Beilstein Information Systems in the U.S., in Englewood, Colo. "In the last quarter or so we have seen an almost exponential increase in the number of people calling us for evaluations and wanting to see this."

CrossFire stems from a cooperative project between Beilstein and IBM researchers at the company's Almaden Research Center in San Jose, Calif. The technology makes use of an IBM RISC System/6000 server to house the server software and store the database files, which can be searched by users from PCs.

There are three gateways for accessing CrossFire, Manrique explains. One is a serial connection via telephone. The second is to use the Internet, with the client computer executing an automatic connection to the server. This is the approach being used by CIC for its VEL. Third, which Manrique explains is what most companies use, is a straight network connection via TCP/IP.

"It has been gratifying to see what people are doing with this," Manrique says. There are companies, he points out, that are looking at the wealth of structures in the Beilstein database to evaluate quantitative structure-activity relationship (QSAR) experiments that they had under way. "There is so much information in the database that it really allows them to expand the boundaries of the kinds of experiments they were designing," Manrique says.

And Beilstein isn't resting. Next week at the ACS meeting in Anaheim, the organization will announce the coming availability of the reactions component of CrossFire, although it doesn't plan the official introduction of the product until the fall ACS meeting in Chicago.

CrossFire today, Manrique explains, is a molecule information search- and display-system. A molecule, or a piece of it, is drawn and the product searches the 6.5 million records it has. It displays what information there is concerning the particular molecule - properties, literature references, tables of data.

But, Manrique says, there are also preparations - how to make the product from commercially available ingredients. There are chemical reactivity data. "We wanted to develop a system that would allow us to do these chemical reaction searches graphically," he adds. The product will be introduced with 5 million reactions. And each of the components of these reactions is a molecule that has all of the physical and chemical properties and literature references included with it.

Elsewhere, IBM's Almaden Research Center is currently heavily involved with developing other database technology. Researchers there are working on two exploratory pilot projects in collaboration with other companies to explore new ways of providing information electronically.

From the IBM point of view, explains Stephen K. Boyer, a researcher at the center, the idea for the projects came as an extrapolation of what firms have been doing with compact disc read-only memory (CD-ROMs.) Storage is becoming so inexpensive, he says, that it's now possible to put 20 or 40 gigabytes on a server. So the idea is to exploit that capacity by developing servers as focus libraries.

One of the projects is being carried out in a joint study with the Institute for Scientific Information (ISI), Philadelphia, to develop a prototype electronic document storage management and distribution system for ISI's electronic library project (see page 42). The other is a collaboration with Derwent Inc., McLean, Va., a leading supplier of patent information, to build a library server for patent information.

The ISI project is an order of magnitude more complex, according to Boyer, because it requires handling the intellectual property rights of more than 300 different publishers. Patent data, on the other hand, is public domain information.

The projects make use of IBM's generic DataFactory approach to the overall design and implementation of digital libraries and electronic information content distribution. The scheme is to have customized local libraries on servers that reflect the particular interests of a local user community. These servers are linked to a main library with the full collection of data. In this way, the system provides a coarse prefilter to the full content and places the customized collection near the local user.

For the patent server, the method involved building an index of all the patents, and then providing electronic links from the index to a delivery service. At its current stage of development, the database has 36 different fields of data, such as abstracts, titles, and authors, and the total size is pushing 20 gigabytes. It includes the data on patents for the past 20 years. A patent has a lifetime of 17 years, Boyer explains, but there is a spike of interest in technology that comes off patent, so the idea is to include a few extra years.

The IBM researchers expect to have the patent server up and operational sometime this year, according to Boyer. Their goal for IBM, he says, is to enable people anywhere within the IBM organization to search on a local patent index database, find what they want, push a button, and have a patent print at their local server.

Meanwhile, Derwent itself has been active with new databases. In January, the company introduced an on-line source of international patent citation information. Called Derwent Patents Citation Index (PCI), it is available on the Dialog on-line system. PCI contains every citation appearing in all patents from 16 patent-issuing authorities. It totals 15 million citations, including data going back to the 1970s. An additional 150,000 patent families and 3 million citations will be added each year.

Another new database, more specifically chemical, comes from Synopsys Scientific Systems Ltd., Leeds, U.K. Introduced in January, it is a chemical reaction database, called BioCatalysis, which provides up-to-date information on the use of biomolecules as catalysts in organic synthesis.

Synopsys notes that chemical companies and academic researchers currently are focusing attention on use of enzymes and microorganisms in synthesis, both as catalysts for novel processes and as alternatives to traditional methods. Biocatalysts offer the advantages of chemo-, regio-, and enantioselectivity, coupled with important environmental benefits.

BioCatalysis is a database of more than 20,000 selected reactions covering the literature to date. It is searchable by various popular reaction searching systems, the company says.

Modeling looks to the future

Probably no other area of information technology is as subject to all the winds of change as is molecular modeling. In a sense, it is the arena in which the forces involved in the information revolution play out in all their variety. Is the amount of information that must be managed today overwhelming? Is computer hardware technology bringing more powerful compute power to the desktop? Is the Internet increasingly a factor in communication? Is the multidisciplinary research team coming more and more into its own? These trends and others all impact molecular modeling.

Molecular modeling software systems from Biosym Technologies (left) and Molecular Simulations (right) are among those now bringing powerful levels of creativity to bear on chemical problems.

The routines involved go collectively under various labels - molecular modeling, computational chemistry, computer-aided chemistry, or computer-aided molecular design. By whatever name, the techniques have become potent tools for research in the hands of computational chemistry specialists. Spiraling software sophistication has been bringing powerful levels of creativity to bear on problems that now run the gamut from life sciences and drug design to materials science, including solid-state chemistry, polymers, and electronic, optical, and magnetic materials.

Now those tools are arriving with increasing frequency at the desktops of the nonspecialists - synthetic chemists and medicinal chemists, for example, or even molecular biologists. It is one of the factors that molecular modeling software developers are considering from their different perspectives as they look at the future.

In the past, says Evelyn Brosnan, product manager for the CAChe product line of Oxford Molecular Group, based in Oxford, England, software companies have tended to split up between desktop and specialist. The challenge for the future, she says, is to address all of those needs, not just one niche in particular. With all of the data available to companies, and primarily the bench chemist, there is more and more of a need to access all the data and leverage it to the highest degree. "You won't do that by being in little niches throughout the company," she says.

Chris Herd, vice president of the life sciences business unit at Biosym Technologies Inc., San Diego, sees the roles of the specialist and nonspecialist changing. Today, a synthetic chemist might have a problem and might go to a computational chemist and have that specialist run some very detailed calculations on one or two compounds. In the future, Herd says, synthetic chemists will still go to the computational chemist for questions like that. But they will be able to make more routine use of perhaps slightly less sophisticated techniques. Those techniques will be of direct relevance to them, so the techniques will help them gain quick insights into the problems they're looking at.

That's not to downplay the role of computational chemists in the future, Herd says. Indeed, he thinks their role will actually be more important than it is today. They will be involved in assessing which technologies are relevant for the bench chemist, they will help the bench chemist understand fundamentals of the technologies they are using, and the computational chemists will have key roles in helping corporate entities make decisions as to which technologies should be brought in-house.

Another factor in the equation is that the nonspecialist - the medicinal chemist, for example - tends to be more visually oriented and less mathematically inclined, says Peter W. Sprague, an MSI fellow at Molecular Simulations. Typically, he points out, a medicinal chemist deals with fairly soft subjects, things that are more subjective. So there's a lot of emphasis on seeing, believing what is seen, and that sort of thing - being able to grasp ideas visually.

"One of the real strengths of automating computer programs," Sprague points out, "is that properly written they can provide beautiful visual interfaces to very complex notions." It becomes possible, he adds, for a chemist to manipulate complicated ideas by simply moving structures around a screen and doing things with them. And that makes it much easier to get work done.

Another concern of molecular modeling software developers is the diverse backgrounds of people making up the new multidisciplinary teams. Various groups have tended to work independently in the past, Brosnan says, probably in part because they don't speak the same language.

A prime example, Brosnan points out, is a computational chemist speaking to a bench chemist. Or a new-product development chemist working with a process chemist. Or a process chemist having to interact with a chemical engineer.

For software developers, according to Brosnan, one of the challenges is to have a set of tools that will allow communication between what used to be different functional areas within a company. "Working on a problem and handing it over the fence, so to speak, to the next group is not the paradigm of the future for effective R&D management," she says.

The Internet is one of the things being considered by molecular modeling software developers. The information revolution, notes John Newsam, vice president of the materials sciences business unit at Biosym, is really defining new paradigms for accessing information. Increasingly, he says, databases will be resident on some other machine accessible through, for example, the Internet from that desktop platform. "The paradigms that are developing for this access are going to be increasingly important," he adds.

The methodologies and the mechanics that are evolving for accessing data and for communications are increasingly important considerations for the software developers, Newsam says. "We need to ensure that the technology that we're continuing to pioneer is compatible with technology being developed for accessing information over, for example, the Internet."

At least a couple of new products aimed at addressing some of these concerns are being formally introduced at the ACS meeting in Anaheim. One of these is a system from Tripos called Unison. "Its main mission," says Tripos' Sullivan, "is to give data access to the wider range in the research team. And that includes chemists, biologists, just anyone working in the research process."

Tripos' Unison allows scientists to do 2-D and 3-D structure searching, integrate with word processing programs, access desktop molecular modeling, and develop what Tripos terms a molecular spreadsheet using Excel.

The product is designed to work in a client-server environment, to run on a PC on the desktop and link into databases throughout an organization so searching can be done for 2-D structures. Scientists can search corporate databases for molecules with desired properties. And then Unison combines the structures and data in what Tripos calls its molecular spreadsheet on the PC.

The other new products are new releases of Hyperchem from Hypercube Inc., Waterloo, Ontario. Hyperchem was perhaps the first PC-based product to bring molecular modeling to the desktop environment a number of years ago. The new releases incorporate ab initio quantum chemical capability into Hyperchem, explains Neil S. Ostlund, chief executive officer of Hypercube.

Ostlund says that Hypercube's mandate is to take not research-level stuff that only a few people want, but computational tools that have stabilized to the point that they can be made accessible to large numbers of people.

"We have avoided until now having ab initio technology," Ostlund says. But, he adds, it's now a technology that's accessible to the desktop, and it's useful and there is something to be gained.

The new products are Hypercube 4.5 and a version for Silicon Graphics UNIX workstations - called SGI 4.5. Hyperchem had at one time a version for SGI but discontinued it. The new release brings an SGI release back into the lineup.

After all of the computational chemistry, all of the modeling, all of the experiment planning, all of the lab work and theorizing, all of the data and information gathering, all of the literature reviewing, and all of the writing up has been accomplished, chemists still must communicate their results.

Traditionally, chemists have mostly communicated their results to the scientific community at large through print publication and scientific meetings. More recently, informal communication between scientists has been carried out via bulletin boards on the Internet. And slowly, electronic publishing of reviewed literature is coming along.

Late last year, the chemical community witnessed the arrival of a fitting medium for the information revolution: on-line electronic conferences on the Internet via WWW (C&EN, Dec. 12, 1994). They were electronic conferences in computational chemistry and chemometrics in which papers or posters were placed on the Web and registrants could carry out discussions via e-mail.

Electronic conferences likely will remain a useful complement to traditional approaches. But the two conferences point up their potential value.

So change is happening in all areas of information technology. Richard Giaimo, MSI's chief technical officer, catches the tone of what's happening. "We look at this as a sea change," he says in regard to modeling. The tools are now becoming essential to use rather than nice to use. "And a lot of people are going to have to learn a lot of things about computers."

[ACS Home Page] [ACS Publications Division Page]