Chemical & Engineering News
October 6, 1997
Copyright © 1997 by the American Chemical Society

INTEGRATION HITS SOFTWARE TOOLS

Melding of programs and their use in many areas is moving computational chemistry further into the mainstream

James H. Krieger
C&EN Washington



From the ACS meeting

Integration is a word that comes up more and more often in discussions of computational chemistry software. As evolution of the technology continues, lines are blurring between software tools and their application in ways that promise to greatly increase and broaden computational chemistry's utility.

In one context, integration might refer to the use of World Wide Web technology as a means of providing transparency for users of computational chemistry applications. In another, it might refer to the removal of seams between software applications along the drug-discovery spectrum from genomics and bioinformatics to chemoinformatics and drug development. It might refer to a new modularity designed into a computational chemistry software application that enables speedy inclusion of new features. Or it might refer to the coupling of computational chemistry and process engineering.

Whatever the reference, integration was indeed a leading motif at the exposition held in conjunction with the American Chemical Society national meeting last month in Las Vegas. ACS expositions over the years have become the predominant showcase for computational chemistry software, providing the chemical community with a chance to assay current development activity in the field.

As the Las Vegas exposition showed, various influences-sometimes alone, sometimes in combination-are at work on computational chemistry software development. The web-based technology available for deployment of software tools is expanding and gathering strength. Activity relating to drug discovery is continuing to sweep the software development effort along. And new attention is being paid to the science and technology underlying the software. One result of all these efforts is a developing level of integration in various respects that is leading computational chemistry and molecular modeling inexorably to the desktops of more and more chemists.

In one example of molecular modeling--determining electron-density difference in an organic molecule--violet indicates a negative difference, and green, positive.

At the same time, added "seasoning" in the software development mix comes from the business side. Growth in both the number and kind of cooperative relationships that computational chemistry software development companies are entering into with their customers and others is an increasingly important element in development of the technology.

And that technology has, as a business, grown to cover an array of activities ranging from computational chemistry, molecular modeling, and simulation to chemical information management and delivery. Such activities are often interrelated.

One factor common to development of software for both computational chemistry and information management today is the blossoming use of web technology for disseminating applications and information. Web technology has been the driving force behind new offerings from computational chemistry software development firms such as Molecular Simulations Inc. (MSI), San Diego; Tripos, St. Louis; and CambridgeSoft, Cambridge, Mass. It is also influencing activities at firms such as information management software developer Daylight Chemical Information Systems, Mission Viejo, Calif.; software tools developer Advanced Chemistry Development Inc. (ACD), Toronto; and computer manufacturer Silicon Graphics, Mountain View, Calif.

For example, web technology figures strongly in ambitious, if not audacious, plans at MSI to expand its user base. The company recently set a corporate objective of having 100,000 users of its products in 2000. That amounts to an order of magnitude increase from today's 10,000.

The company sees 100,000 users as a grand goal but one that is quite achievable, says Scott Kahn, director of life sciences marketing at MSI. "The technologies are known to work. It's just a matter of getting out and getting them into an environment where chemists can use them."

As MSI sees it, that environment is the web. "As a company, we have embraced the web as a workable working environment," Kahn says. "It's not just a toy. It's something that really does work."

Silicon Graphics, too, views web technology as an increasingly important element in the chemical arena. Indeed, a project it was recently involved in, called CrunchServer, well illustrates the strong integrating role that web technology can play. Collaborating with chemical manufacturer BASF, Ludwigshafen, Germany, the company has applied web technology to provide thermodynamics data to process engineers through the BASF intranet.




TO SIDEBAR: Ab initio to engineering via the web



"This is an example of something that will appear in other methodologies," says John Carpenter, manager of chemistry applications at Cray Research, Silicon Graphics' supercomputing subsidiary. "Whenever you have a computational task that is well enough defined that you can do it within a web page, that will be the most efficient way to do it."

In addition to web technology, Carpenter sees parallelization and scalability as a matter of increasing concern for chemical applications. "If we're going to be able to continue to make progress in the size of calculations and address many of the problems we can't address now, we have to be able to run faster calculations," says Carpenter. "The only way we're going to be able to do that is through parallelization."

Another project, announced by Silicon Graphics the week of the Las Vegas exposition, illustrates the role that parallelization can play. A Silicon Graphics Origin2000 128-processor supercomputing system has been installed at Vertex Pharmaceuticals in Cambridge, Mass., a pioneer in the application of structure-based drug design. "Vertex is looking at the drug design process and seeing it not only as a chemistry experiment but also as a computational experiment," says Roberto Gomperts, manager of chemistry and oil and gas applications in Silicon Graphics' scalable systems group in Hudson, Mass.

The Vertex Origin2000 is aimed at speeding the design of small-molecule oral therapeutics, and the company has already seen a sharp drop in computation times, according to Mark Murcko, vice president and senior scientist at the company. Previously, the computations for docking a molecule inside an enzyme could be spread over only four processors, so the problem required three days to work out. With the Origin2000, Vertex scientists can split the task into 24 separate computation streams and solve the docking problem in three hours.

Exploiting web technology
But it was web technology that was on the minds of many at the Las Vegas exposition. MSI, for example, had a covey of new programs for medicinal and synthetic chemists designed for inclusion in its WebLab concept. WebLab designates a network environment for working with computational chemistry applications. The software products designed for WebLab employ computational methods targeted at specific research problems, such as determining protein structure. A user with a desktop PC selects or provides required data and information on the browser page. The application itself runs on a network server, and the results are fed back to the desktop.

Some Java applets-small programs provided by a network server that download within a web page for short-lived local use-are used in WebLab pages for display tools and interactive features. The basic molecular visualization package for WebLab applications is WebLab Viewer, which operates as a browser helper application and is available for downloading free from MSI's web site. Application modules are bundled in groupings focused on different areas. For example, last spring, MSI introduced its first such products-WebLab Gene Explorer, including a 3-D Structure Prediction module, and Polymer Explorer, with a Property Predictor module.

Now branching out, the company has launched WebLab MedChem Explorer for medicinal chemists and WebLab Diversity Explorer for synthetic chemists. Currently, five application modules are available for the MedChem group and one is available for Diversity.

With five modules, MedChem Explorer provides a good example of how the modules are integrated to form a system that marries desktop productivity tools, the web environment, and UNIX servers. Using one of the modules, for example, a chemist might input some molecules knowing nothing more than that they exhibit biological activity and then ask for, say, 10 alignment models. Basically, the system goes to the web site, gets information from the viewer, and behind the scenes goes on to perform conformational analysis and to call for detection of pharmacophores. Pharmacophores are those structural features of a molecule required for particular biological activity. "An enormous amount of work is happening on the server," Kahn says, "unbeknownst to the user."

While the server is working on the alignment, it dynamically updates itself and pushes status information back to the user. But the user can also just make a web browser bookmark to return to the page for the alignment module later, then go off to do a database search using another of the modules. "We've tried to put together an interface that really blurs the distinction of where the data are located," Kahn explains. Whether the databases are in different geographic locations or of different types, the software takes care of talking to the various packages, so the chemist needn't be concerned. And the search may involve locations that are different from the one where the alignment is crunching away.

A molecule in a list resulting from the search can be clicked into the viewer for closer examination. But the chemist might also want property data for the molecules. Another module will retrieve all of the property information that's available and present it in a Microsoft Excel spreadsheet format. Then the chemist can bring a property calculator module into play. The idea, Kahn notes, is to take the acquired property information and embellish it with information that can be calculated. Some 200 properties such as dipole moment, log P (octanol partition coefficient), and molar refractivity are available. While the properties are being calculated, the chemist might use the bookmark made earlier to return to see the alignments.

Development of the other WebLab group, Diversity Explorer, has been carried out in close collaboration with Daylight Chemical Information Systems, Kahn says. Daylight's information management focus is on supplying software tool kits that users can employ in devising their own applications. These tool kits were used for the MSI application, particularly the Reaction ToolKit introduced by Daylight last year, and the SMILES ToolKit for dealing with the SMILES form of chemical line notation.

Diversity Explorer is used to create combinatorial chemistry libraries, with the software supplying, on the web, results of what those libraries are. The user can register a library, carry out diversity calculations, and perform focusing calculations. And, Kahn adds, because it's on the web and part of WebLab, it is integrated completely with the conformational analysis, database searching, and pharmacophore alignment of MedChem Explorer.

Also in the area of web technology, Tripos, a computational chemistry software firm that focuses essentially on pharmaceuticals and biotechnology, is particularly excited about the second version of its GASP (Genetic Algorithm Similarity Program). Tripos previewed the new version in Las Vegas and is about to release it commercially. Among the new features in GASP 2.0 is a Java-based chemistry viewer that's dynamically downloadable to any machine running Netscapebrowser software.

GASP is designed to analyze small groups of mol ecules for a pharmacophore using genetic algorithms. Tripos developed a web-based interface for the original program, introduced around the middle of last year. It was one of Tripos' earlier moves into web technology. As such, it helped to focus the company's strategy for delivering products based on web technology that it calls Discovery.Net. In implementing that strategy, Tripos has created a kit of tools, including Java applets and other applications that are used in developing interfaces to its computational software.

The front end of GASP, explains Scott Hutton, vice president and general manager of Tripos' discovery software business unit, is completely built in Java. In the new version, it now has the ability to pull up a view of a 3-D molecule built in Java code. No plug-ins or helpers are necessary.

"We've now gone so far as to have a chemical editor that allows you to pick points and measure distances and interact with a 3-D structure directly through the web browser," Hutton says. "That's a really big deal-a Java-based chemistry viewer that's dynamically downloadable to any machine where you're running current Netscape software."

As another new feature of GASP, Tripos has worked out a way to manage security within the web. DataManager, a framework for developing web applications, is one of the tools Tripos has developed under its Discovery.Net approach; it is available through Discovery.Net Club. The latter is a club for developers of web tools set up by Tripos to offer its web tools to members at low cost. DataManager has built-in capabilities for security, and these can be applied to GASP.

"The web by its nature is so open that it's hard to keep information secure," Hutton explains. Now, with DataManager, it's possible to have password protection just to enter the web page where GASP is available, he says. Moreover, password protection can be applied to individual data sets within the application.

CambridgeSoft, too, has been busy with a new web development; its widely used ChemDraw structure-drawing program is now available as a Netscape Navigator and Microsoft Internet Explorer plug-in. As such, it enables a user to do chemical sketching and querying from within the browser window. Creators of web pages can embed ChemDraw structures and drawings in a document, and they can be viewed, saved, and printed from within the web page.

"All the functionality is there but in the form of a plug-in," says Michael J. McManus, vice president of marketing at CambridgeSoft. He notes that three versions of the plug-in will be available. A NET version is downloadable free from CambridgeSoft's web site. STD and PRO versions, with additional features, will carry a charge.

Hypercube, Gainesville, Fla., is also getting its feet wet in web technology. It is working on a demonstration project using ActiveX, Microsoft's language for dynamically downloadable applets. A user will be able to draw a molecule and get the log P value, says Neil S. Ostlund, president and chief executive officer of Hypercube, developer of the HyperChem molecular modeling package for desktop PCs. He expects the capability to be available soon.

Daylight Chemical Information Systems is yet another firm at work on web tools. In keeping with the company's tool kit approach, it is developing a set of Java-based interface building tools focusing on three operational tiers: query formulation, analysis, and visualization. The company now has an alpha version of the software, and a beta version for testing by customers is imminent.

Although Java based, the tools won't be browser linked, says Yosef Taitz, CEO of Daylight. Java, he points out, is an independent tool, so a browser may be used as a background for the Daylight developments, but the Java window is independent and can be a stand-alone if the user has Java software. The tools will employ so-called Java beans, encapsulable Java programs that connect- for example, a structure-drawing bean.

One web-related development introduced at the Las Vegas exposition falls outside the boundaries of computational chemistry but is nevertheless pertinent to laboratory operation. ACD has developed a Spectral Laboratory Information Management System (SLIMS).

ACD's specialty is software programs for predicting properties. The company's earliest software, for example, includes a sketcher called ChemSketch and programs that calculate boiling point, log P, 13C NMR spectra, and 1H NMR spectra. The company now has more than 40 software products, most of them introduced within the past six months.

SLIMS is similar in concept to other types of LIMS but is based on standard web technology, with web pages, Java applets, and the like. Its focus is on samples, structures, and spectra. For example, structures can be moved and modified on the web, and when work is complete, a spectrum can be attached as a floatable Java applet.

Integrating drug discovery
The drug-discovery spectrum is an area latent with opportunities for integration of computational chemistry activities. And integration along that span has particularly characterized recent software development at Oxford Molecular Group, headquartered in Oxford, England, with U.S. offices in Campbell, Calif.

One result is a new focus on combinatorial chemistry library design. It involves a new 3.1 version of Oxford's Tsar program, which has been traditionally aimed at doing quantitative structure/activity relationship (QSAR) analysis.

"We're trying to put together quite a complex, wide range of applications under one interface-into the spreadsheet interface," says John Holland, business development manager at Oxford. That would allow modelers to identify suitable reagents for library design by having direct access to commercial catalogs and company databases through an interface to RS3 Discovery, Oxford's corporate chemical information management system that makes use of generic relational databases. It would then provide them with direct access to many of Oxford's computational and toxicity prediction tools.

One of those tools is Topkat, a program that predicts toxic effects of chemicals from their chemical structures and provides for quantitative structure/toxicity relationship (QSTR) analysis. Oxford has linked Topkat to the spreadsheet interface in Tsar 3.1 so that numbers from the toxicity predictions-rat LD50, for example-can be incorporated into the spreadsheet along with other data for use in library design.

"We think that's going to provide a new way of doing QSAR and QSTR in the future," Holland says. "It takes a lot of the real grunt work out of doing that."

A new statistical tool has also been incorporated into Tsar 3.1. One of the problems in library design, Holland explains, arises from trying to squeeze down the data from often massive data sets into some sense of description of the molecules. Most conventional data reduction methods have used linear combinations of data. The new method in Tsar 3.1 is nonlinear mapping.

The nonlinear mapping method employed by Oxford uses the distance between data points to get a distance matrix that explains the variance in the data. A minimization process then provides a new description of all the original data, reducing it from, say, 50 columns' worth to two.

Holland explains that a traditional QSAR set might have 30 or 35 compounds. With combinatorial library design, there are probably thousands. If the number of columns of data goes up by two orders of magnitude as well, he says, very few of the traditional statistical techniques can cope with it.

With the shift in the order of magnitude wrought by combinatorial chemistry, Holland points out, it's necessary to think of reagent lists with tens of thousands of molecules and virtual libraries with perhaps a million compounds. Being able to scale down the lists enables the researchers to make selections for carrying out actual syntheses. For comparison, 10,000 compounds is about what a pharmaceutical company can screen in a week using current high-throughput screening techniques.

Elsewhere along the drug-discovery spectrum, Oxford has also provided a linkage of its Diverse Information Visualization & Analysis (DIVA) program with RS3 Discovery. DIVA is a desktop tool designed for manipulating, analyzing, and visualizing large sets of chemical and biological data.

Efforts at Chemical Design, a British computational chemistry software firm with U.S. offices in Mahwah, N.J., have been directed at integrating biological screening data management into drug discovery using the company's Chem-X software system. Chem-X is designed to provide access to an entire range of software tools for handling chemical and biological information. The molecular modeling and data management capabilities of Chem-X are integrated with combinatorial chemistry tools, automatic programming of synthesis robots, reagent and plate inventory management, and links to biological testing results in Oracle relational databases. Adding biological data management capabilities makes it possible to organize and exploit all discovery information, says Keith Davies, technical director of Chemical Design.

The biological data management is handled by two new Chem-X modules. One of them, ChemHTS-1, manages the allocation of chemical samples to the well plates used in high-throughput screening for biological activity.

The second module, ChemHTS-2, processes the test results. Among its features, the results from primary screening may be previewed before scaling and loading into an Oracle database. Another feature is that assay and quality control data can be viewed in plate layout. An option for displaying the corresponding structure can be selected by clicking on a particular plate well.

Core software draws attention
A burst of activity aimed at the underlying science and technology in computational chemistry software became evident earlier this year. New versions of existing core computational chemistry packages as well as entirely new approaches were introduced at the San Francisco exposition. Development activity has continued to refine existing software and is looking ahead to the future.

At Schrödinger Inc., Portland, Ore., for example, the company has now shipped the newest 3.0 version of its software. That software was renamed Jaguar (after Austrian physicist Erwin Schrödinger's cat) earlier this year to celebrate the culmination of Schrödinger's development plans established in 1990, when chemistry professors William A. Goddard III of California Institute of Technology and Richard A. Friesner of Columbia University founded the company. Jaguar is an ab initio electronic structure theory software package designed in 1990 to be new-generation quantum chemistry software.

(SDP), announced in the spring, and is pushing ahead toward the next generation of biomolecular modeling methodology. SDP brings partners from academia and industry together with Schrödinger. In addition to Goddard and Friesner and their groups, academic partners include chemistry professors Bruce J. Berne and Barry Honig of Columbia, William L. Jorgensen of Yale University, and Ronald M. Levy of Rutgers University, Piscataway, N.J. Industrial partners include pharmaceutical manufacturers Novartis and newly added Rhône-Poulenc Rorer. Discussions are under way with a potential third partner.

Over the past six or seven years, Friesner explains, he has been researching how well current molecular modeling- classical mechanics-works. For some purposes, he says, molecular modeling force fields are quite reasonable. But for really giving highly accurate binding energies, or predicting structures for proteins or peptides, there are a bunch of systematic errors. "We put a lot of effort into trying to quantify that," he says.

Use of the new Jaguar software was a key element in that effort, Friesner notes. "Being able to make sure you could really converge the quantum mechanics and get numbers accurate to a few tenths of a kilocalorie was critical for that," he says. Noting that a lot of time was spent looking at solvation and other effects, Friesner adds that "I think we now have a pretty good picture of the situation."

Addressing the systematic errors is the goal SDP is charged with. Schrödinger has seen SDP from the start as being a multiyear project. Other people are looking at the problem, too, Friesner notes. "Everybody knows that this is a problem," he adds, "but it's a very hard problem."

In essence, the group is trying to build quantum mechanics into the force field. A user would have the benefit of inexpensive computational costs for a classical mechanics calculation, but one that implicitly incorporates quantum mechanical effects.

Goddard adds that the payoff occurs when efficiency and speed reach a point that the software can be made essentially fail-safe and automatic. "The idea," he says, "is to put quantum mechanics eventually in a black box, so you don't have to know beans about it."

Meanwhile, MSI, Tripos, CambridgeSoft, and Oxford Molecular are among software developers embellishing a number of their core software programs. MSI, for example, has brought out enhancements in several products. All were released last month.

The current MSI, the result of a 1995 merger between then Molecular Simulations and Biosym Technologies, has products that came from both premerger arms of the company. It has continued to develop the products but also to integrate aspects of some of the programs.

A case in point is the new Felix 97.0, targeted to NMR spectroscopists studying either small molecules or macromolecules. Two product lines have now been merged into one. It's a major revision, Kahn says, "more than a facelift." Felix, he explains, had always had a good breadth of coverage, whereas NMR Compass and related products had the benefit of automation. So MSI took a lot of the ease of use and streamlining capabilities that were in Compass and brought them into the Felix environment. "It's an example of where the merger has really paid back some benefits to the community," Kahn adds.

Another of the new products is a 3.5 version of Cerius2, MSI's core software for doing small-molecule modeling. A key element of the new release is the incorporation of a number of new quantum mechanics capabilities. DMol3, for example, a new release of a widely used MSI density-functional theory (DFT) program, will be available within Cerius2. DMol3 combines the capabilities of DMol and DSolid programs, providing DFT calculations for molecular species and solid materials.

Tripos, too, has been busy on its core high-level computational chemistry software, Sybyl. A new 6.4 release is coming out this month.

"Sybyl 6.4 is a pretty big release," according to Hutton. It includes enhancements across the board. But a key aspect is that it also includes a new 3.0 release of Unity, Tripos' query and analysis system for searching multiple, distributed databases of compounds. A new capability of Unity 3.0 is called partial-match 3-D searching.

Traditional 3-D searching, Hutton explains, requires that all pharmacophore features on a molecule fit against a defined set of pharmacophore features in the receptor site. "But God doesn't work that way," he says. "Many times atoms bond to only a subset of pharmacophore features."

Hence, the new searching technique in Unity allows a candidate structure to match only partially into the receptor site. The user can specify the fewest number of binding points that the candidate molecule must have. In this way, Hutton says, many more possible binding compounds are returned from a 3-D search.

Tripos has also been working to implement force fields other than its own within the Sybyl environment, according to Hutton. By the end of the year, the company expects to have incorporated Amber and the relatively new state-of-the-art Merck force field.

CambridgeSoft has enhanced the functionality of its Chem3D Pro 4.0 package announced last spring. Designed for use on a desktop PC, the package has contained the semiempirical modeling program Mopac 93, a version of the original public-domain Mopac developed by Japanese computer manufacturer Fujitsu Ltd. The new Chem3D package will contain CS Mopac Pro, based on Fujitsu's latest version, Mopac 97.

Newly implemented add-ons from development partners of CambridgeSoft are also enhancing Chem3D. The quantum chemistry program Gaussian 94W from Gaussian Inc., Pittsburgh, has now been interfaced to Chem3D Ultra, providing Chem3D with ab initio in addition to molecular mechanics and semiempirical capabilities. Gaussian 94W is a complete implementation of the Gaussian 94 modeling program for the Windows environment. The improved interface allows the Chem3D user to view molecular orbitals and surface graphics computed by Gaussian 94W.

Conformer, another add-on, developed by Princeton Simulations, New Brunswick, N.J., is now being shipped in a Macintosh version. A Windows version is due out soon. The program, which installs in Chem3D, allows conformational searching on the desktop.

A new 3.1 version of Oxford Molecular's CAChe is another example of integration. CAChe is a desktop project-focused computational chemistry system that was designed for the research chemist. The new 3.1 version provides an interface to DGauss, a high-powered DFT ab initio program that formerly ran only on supercomputers. The capability comes from UniChem, a computational chemistry software package that Oxford Molecular acquired from Cray Research early last year when that supercomputer manufacturer was itself acquired by Silicon Graphics. "UniChem, which was designed for Cray supercomputers, has been [adapted for] the desktop and integrated with an application which was designed for ease of use from the desktop," explains George Purvis, senior vice president and director of computer-aided chemistry products at Oxford Molecular.

Development on core computational chemistry programs also continues to bubble along at Wavefunction, Irvine, Calif.; Q-Chem, Pittsburgh; and Chemical Computing Group, Montreal.

Wavefunction has begun shipping the new 5.0 version of its Spartan fully integrated modeling package announced last spring. The new version now has a nucleotide builder, a combinatorial library function, and a spreadsheet format. It also has an implementation of the Merck force field.

Q-Chem has added a tool to its Q-Chem software for exploring quantum chemistry in aqueous solution. According to Eugene D. Fleischmann, director of sales at Q-Chem, the software's implementation of the Langevin dipoles solvation model is unique in that the quantum mechanical calculation takes into account the polarization of the solute wave function due to solute-solvent interactions. Fleischmann says the company plans to add a new functionality to the Q-Chem software about every quarter.

Chemical Computing Group has been working on applications for its Molecular Operating Environment (MOE) software. MOE was introduced by the company, itself relatively new, last spring. MOE was conceptually new, in that it was designed as both a general-purpose chemical computing environment and a methodology development platform for use by chemists to build and assemble their own tools quickly. The latter could be accomplished with the new built-in high-performance programming language, called Scientific Vector Language.

MOE has now been ported to Macintosh, adding that to the existing wide range of computer platforms for which MOE is available. On the application side, Paul Labute, president of Chemical Computing Group, says the company is moving into bioinformatics. It is also going to make a major push in materials, Labute says-with crystal builders, for example- and he expects that the next MOE release will have a comprehensive suite of materials applications.

In addition, Chemical Computing Group is looking carefully at web technology, according to Labute, and expects to be moving in that direction. Also, he adds, the company is about to come out with a release that has all of the MOE applications embedded in Microsoft Excel spreadsheet format.

And so the development activity in computational chemistry software continues. Molecular modeling on every desktop. Quantum chemistry in a black box. As each ACS exposition comes along, the possibilities seem less and less outlandish.

ACS Pubs Chem Center