A new generation of research tool
A chemist in need is a chemist without SciFinder. For 5 years and running, SciFinder has been meeting the needs of the online chemical information browser.
In the early 1990s, the World Wide Web was still in its infancy and finding scientific information by searching online databases was a specialized task that was primarily the domain of librarians or information specialists. Before executing a search, librarians had to select the right databases and phrase the query in the appropriate command language using carefully chosen search terms and Boolean logic operators. Research scientists who needed the information might have waited a day or so to receive the results, only to find, in all too many cases, that the references were not entirely relevant to their interests. But all that has changed, through the efforts of some forward-thinking chemists and the management teams at Chemical Abstracts Service (CAS) in Columbus, OH.
This year marks the fifth anniversary of the desktop research tool SciFinder. Unveiled at the San Francisco ONLINE/CD-ROM 94 trade show just before its release by CAS, SciFinder was an innovative, new generation research tool. SciFinder was praised by R&D Magazine editor Tim Studt as a simple, elegant solution to the scientists challenge of keeping up with the growing volume of scientific information (1).
New approaches to an age-old problem
Science has always been largely a matter of gathering and evaluating information. Information is the one tool that every scientist usesand more frequently than any other. Although the challenge of gathering information may be partially resolved through direct observation in the laboratory or the outside world, a significant portion of information can be obtained through literature and patents surveys. This step is especially crucial if the aim is to patent a novel process or a newly synthesized compound. Moreover, a retrospective search of the scientific literature can offer valuable insights and ideas for new avenues to explore. Literature searches are essential for keeping up with ones field.
CAS developed Chemical Abstracts (CA) in 1907 to handle its voluminous collection of chemistry-related abstracts. The total volume for that year was just 11,847 abstracts, but even that quantity of literature would have been difficult to deal with without the kind of indexing and summaries that CA could provide. Reflecting the rapid growth of research activity during the 20th century, the annual production of CA doubled in one year but was more than 17 times larger after 10 years, and more than 100 times larger after 30 years. By 1995, CAS was producing more than 600,000 abstracts per year, and the chemical research community was relying more on the power of computers to keep up with the demand for information.
By the early 1990s, corporations were hiring information specialists to manage the increasing volume of available scientific information. Chemists who once reached for a printed copy of CA to find their own answers were now expected to refer their questions to corporate information specialists. CAS wondered to what extent they were really doing so.
In an article published during SciFinders first year, Randy Cain and Kirk Schwall, who were involved in developing the product, described the impetus that led CAS to develop a new research tool for scientists (2). An information audit at one customers site found that only 32% of inquiries from the companys chemists were sent to information specialists. Of the remaining 68%, 41% were handled through printed publications, and 27% simply were not answered. If a research tool could be produced that would allow scientists to answer 80% of their own questions and refer the remaining 20% (presumably the more complex search questions, e.g., for patentability) to the organizations information specialists, the gain in productivity would be enormous. CAS set out to do just that.
Laying the groundwork
CAS began to explore the idea of a new desktop research tool in August 1991 and formed a new product development group in September 1992. The team included 40 staff members from the organizations research, information systems, marketing, and new product development units.
Customers also needed to be involved early in the development process. By March 1993, CAS began meeting with key individuals at large chemical companies, including information directors, research directors, and more than 300 scientists in the research groups.
After assessing user needs, CAS decided that a key objective of the new product would be to give scientists more control over the direction they wanted to follow in their research. In developing the new information tool, the team would seek to avoid the zero-answer syndrome, common for new users of online information retrieval systems. The new program would be able to find multiple answer sets, providing possible answers in spite of any syntax errors, spelling mistakes, or other problems that novices encounter. CAS also wanted to give chemists faster access to scientific journals and an easier way to become aware of the new studies regularly recorded in the ever-growing CAS database. Because most scientists were unfamiliar with information retrieval techniques and command language, the process of asking a question had to be conversational and intuitive. This required a graphical user interface (GUI) that busy scientists could start using with almost no training.
Software development and testing progressed throughout 1993 and 1994. A cross-functional group of external consultants developed a list of key user interface requirements. Agreeing that the system had to be intuitive and easy to use, the group also wanted a clean interface with routine tasks on the screen, not hidden under drop-down menus. It wanted a product that could run on Windows and Macintosh platforms. Finally, the product had to be aesthetically pleasing, incorporating color, words, and icons that clearly represented appropriate user functions.
SciFinder underwent extensive testing by major chemical and pharmaceutical companies worldwide, beginning with the first prototype in July 1993. The alpha test comprised 100 users from five sites, including one in Europe, and used a variety of hardware and operating systems. A year later, beta testing began with eight additional companies and concluded in March 1995. During this time, a significant internal testing effort at CAS complemented the customers tests.
Content: A key feature
By the mid-1990s, electronic information services were already proliferating; CAS realized that no matter how innovative SciFinders search interface would be, another dimension was needed to differentiate the product (i.e., content, the collection of information) that scientists would access through SciFinder. Providing access to CAS databases became a key factor in distinguishing SciFinder from competitive search tools.
CAplus. SciFinder provides access to several CAS databases, including the comprehensive file of bibliographic records, CAplus. CAplus contains records for all the documents selected for coverage and indexing by CAS. Like the familiar publication Chemical Abstracts, the database covers worldwide literature from all areas of chemistry, biochemistry, chemical engineering, and related sciences from 1967 to the present. Documents include journal articles from more than 8000 journals, conference proceedings, technical reports, books, dissertations, and reviews. Unlike any other database, CAplus contains scientific literature and patent documents from more than 30 national and international patent offices.
CAplus also includes references to documents not indexed by CAS for coverage in the printed CA. These additional references are derived from cover-to-cover analysis of more than 1300 scientific journals. Some of the additional references are to journal articles; others are for items not covered in CA biographical items, book reviews, editorials, errata, letters to the editor, news announcements, and product reviews.
To make CAplus current, CAS includes fully indexed records, as well as records in progress. The latter includes bibliographic information, even if an abstract is not yet available. The abstracts are added as soon as they become available.
CAS Registry. Complementing the bibliographic information in CAplus is the CAS Registry database, the worlds largest substance identification system. The substances in this file are derived from the chemical literature and patents indexed in CAplus and other sources, such as regulatory lists. All kinds of substances are recorded in the Registry: inorganic and organic compounds, alloys, biosequences, coordination compounds, minerals, mixtures, polymers, and salts. Registry records contain chemical structures for more than 24 million substances, along with the systematic CA index names, CAS Registry number, synonyms, molecular formulas, alloy composition tables, nucleic acid or protein sequences, and ring analysis data.
CASREACT. SciFinder access includes a specialized chemical reaction database of substance information called CASREACT. This database offers reaction information derived from documents covered in the organic sections of Chemical Abstractsjournals from 1985 to the present and patents from January 1991. Single- and multistep reactions are included. The records contain reaction information consisting of structure diagrams for reactants and products; CAS Registry numbers for all reactants, products, reagents, solvents, and catalysts; yields for many products; and textual reaction information. The reactants, reagents, and products can be searched by structure with a single reaction query. Roles, reaction sites, and mapping of atoms between reactants and products are also structure-searchable.
CHEMCATS. Recognizing that scientific interests and business-related tasks often go hand-in-hand, CAS added a data base to the SciFinder array to help clients identify commercial sources of chemicals. CHEMCATS (Chemical Catalogs Online) is a catalog file containing listings of commercially available chemicals and their suppliers worldwide. Each record includes the suppliers information (e.g., name, pricing terms, products and services, and packaging), the catalog name, chemical and trade names, grade information, CAS Registry number, structure diagram, properties, regulatory information, and prices.
Product improvement has been continuous. In 1996, CAS released SciFinder 2.0. In a review of that release, co- authors Carmen Nitsche and Robert Buntrock confirmed that the product fulfilled its essential purpose: With SciFinder, chemists can help themselves, exploring the literature as they wish, rediscovering the serendipity that comes with browsing. Information professionals take on the role of trainer and coach, guiding end-users to finding the best answers and advising them on when to seek more complete information through other means (3).
SSM. In 1997, CAS upgraded SciFinders capabilities with the SciFinder Substructure Module (SSM). The user defines rings, chains, substituents, and R-groups (Figure 1, first screen). A Substructure Explore presents a set of candidate substances that match the substructure (Figure 1, second screen). They then can retrieve all the references or just those relating to specific topics, such as adverse effects, biological activity, and preparation. SSM allows researchers to generate new ideas for research as well as to identify derivatives of existing chemicals that exhibit more desirable properties.
ChemPort. Whenever possible, SciFinder provides a set of references in response to the users questions and a link to the full text of the identified journal article or patent. The ChemPort Connection feature works automatically when the user clicks on the PC icon that appears to signal the availability of electronic full text. A ChemPort options page may show a number of choices for accessing the full-text document. For example, Subscribers view e-article is available for subscription holders. Other options appear for accessing the full text through a subscription agent or the users in-house library, or even for purchasing the article for a one-time additional fee.
MEDLINE. Skipping ahead a few versions, SciFinder 5.0 was released in mid-1999, and a new biomedical component became availablethe U.S. National Library of Medicines MEDLINE database. MEDLINE can be searched with CAplus during the Research Topic and Author exploration. Users also can execute a substance identification search and choose to get references from CAplus and MEDLINE.
WebLab ViewerLite. In another enhancement drawing upon resources outside of CAS, SciFinder 5.0 incorporates links from substances in the CAS Registry to WebLab ViewerLite (Molecular Simulations Inc.). WebLab ViewerLite is a high-end molecular visualization application that uses OpenGL graphics for visualizing molecular models. These models can be rotated, scaled, edited, labeled, and analyzed.
Of course, the more traditional method of ordering a document copy is not neglected: SciFinder also gives customers a direct link to the CAS Document Detective Service. Along with the ChemPort feature, SciFinder strives to ensure the researcher a means of acquiring the full-text document, if desired.
SciFinder Scholar. In 1997, CAS launched SciFinder Scholar to address the needs of chemistry students and faculty. Although Scholar has the same look, ease of use, and the main search tools as SciFinder, many of the features that allow SciFinder to be personalized are not available on SciFinder Scholar. Because different individuals may use SciFinder Scholar at different times, pricing is based on the number of simultaneous users, whom the university does not need to identify personally.
The pathways to knowledge
SciFinder is a new and simple way for researchers to obtain information. Instead of a blinking cursor waiting for the users to enter a command, the opening screen of SciFinders GUI presents several pathways to knowledge, that is, obtaining information: Explore, Browse, and Keep Me Posted (Figure 2). The user explores information easily by chemical substance (e.g., exact structure, molecular formula, or substance ID), reaction, substructure, research topic, author, or document identifier (Figure 3). Searches are conducted in a user-friendly, question-and-answer format, using internal dictionaries and a thesaurus to look up key terms in the request phrase and increase the search power. Users can also browse through the tables of contents of 1300 journals, and a Keep Me Posted function monitors new literature on current subjects and alerts users to recent arrivals.
SciFinder is designed to make it look easy, in spite of its sophisticated algorithms and platforms. Unlike command line search systems that expect the user to anticipate the variety of terms a database may contain, SciFinder automatically considers synonyms.
For example, a user wants to find literature on the effects of ACE inhibitors on treating heart disease. Included in the search would be synonymous terms such as cardiovascular disease. The search results display all the literature references that include heart disease and ACE inhibitors, heart disease or ACE inhibitors, and entries that include just heart disease and just ACE inhibitors. Thus, the user can choose a narrowly defined set of documents or a more general set about heart disease or ACE inhibitors.
After examining a few of the references, the user wants to narrow the search and identify literature that pertains to the treatment of one kind of heart disease: heart failure. A Refine tool makes it possible to select a subset of the answers, limited by any of several criteria:
The user may choose to refine by research topic and enter the phrase treatment of heart failure. SciFinder runs this search against the set of existing records found originally and identifies a subset that deals specifically with ACE inhibitors used to treat heart failure.
The cost of knowledge
Pricing is always a crucial factor in the acceptance of an electronic information service. CAS speculated that novices would be intimidated by the ticking meter effect of the connect-time pricing model. It was important that researchers be given the time to do needed, thorough searches without financial limitations. So SciFinder was priced to encourage companies and their research scientists to make the product a daily research tool.
SciFinder is available principally through flat-rate annual subscriptions that allow unlimited use for a given number of seats; however, CAS also offers an option to purchase a certain number of tasks (a task means one search question and the resulting answers). Pricing arrangements are prepared for each organization and determined according to the requirements of its research groups, the number of SciFinder users, and the installation options.
Assessing SciFinders impact
Rapid growth of the customer base and excellent reviews in the information industry press were two tangible indicators of SciFinders success. The Information Industry Association shared much of this enthusiasm for the new research tool and granted SciFinder its 1995 HotShots award as Best Science/Technology Service of the year.
SciFinder has twice been named one of the 100 most technologically significant new products and processes of the year by R&D Magazine. In 1996, Tim Studt, editor and program co-chair, said, We selected SciFinder because it significantly changes the way researchers access scientific information. The products easy-to-use graphical interface, coupled with CASs extensive databases, makes SciFinder a simple, elegant solution to the scientists challenge of keeping up with the growing volume of scientific information (1).
Honoring SciFinder SSM in 1998, Studt noted, We consider products, such as the SciFinder Substructure Module, that receive the R&D Award to be the Nobel prizes of applied research. SciFinder has upheld its position as a major contributor toward the advancement of scientific research since the original model won the R&D 100 Award in 1996. Now, with the added capability of substructure access, scientists have even more pathways to explore CAS databases and generate new ideas for research (4).
In 1998, an independent research firm reported the results of an extensive survey of scientists who had used SciFinder for at least one year. More than 250 scientists in North America and Europe were interviewed regarding SciFinders impact on their work. More than 96% said SciFinder saved them time in collecting research information and also that SciFinder makes it easier to access information. Overall, 95% of those surveyed confirmed that SciFinder has improved scientists research process (5).
The academic world has been no less receptive. SciFinder Scholar, using SciFinder technology especially adapted for campus-wide access, rapidly became the leading desktop research tool for campuses during 1999. Among the universities using SciFinder Scholar in the United States and internationally were almost all of the top 26 Ph.D. chemistry programs, as ranked by U.S. News & World Report (6).
Searching for the future
CAS plans to release SciFinder 2000 in the Fall of 2000. The exact release date has not been finalized; access www.cas.org for more updated information. SciFinder 2000 will introduce a range of new capabilities for desktop research. The new capabilities include the following: company or organization name exploration, expanded access to full-text documents in the customers corporate library, citation linking, and reaction exploration enhancements. These state-of-the art exploration tools for data mining and visualization will provide multidimensional graphing and navigation capabilities.
SciFinder 2000s new visualization tools incorporate a set of capabilities. Rather than simply searching for references on a specific topic, a scientist can use these tools to see at a glance how various research interests are represented in the databases and how they relate to one another in the scientific literature.
Todays electronic research environment is a whole new ballgame. To remain a winner in this exciting competition, CAS is committed to making SciFinder increasingly versatile and responsive to the needs of scientists. And like the fielder who makes even the most difficult catches in stride, SciFinder will always try to make it look easy.
Kirk Schwall is manager, Authority Database Operations and Database Quality Engineering, in the Editorial Division of CAS (Chemical Abstracts Service, PO Box 3012, 2540 Olentangy River Rd., Columbus, OH 43210; 614-447-3684; firstname.lastname@example.org). He has been with CAS since 1982. Previously, he served as the manager of New Product Development. He was primarily responsible for leading the SciFinder user requirements and interface design efforts. Schwall graduated from Capital University, Columbus, with a B.S. degree in chemistry.
Kurt Zielenbach is the SciFinder product manager in the Marketing Division of CAS (614-447-3683; email@example.com). He has been with CAS since 1984 and served as a product development specialist in New Product Development where he was responsible for business development for various new product initiatives related to SciFinder. From 1993 through 1998, he served on the SciFinder software development team. Zielenbach graduated from Wittenberg University (Springfield, OH) with a B.A. degree in music.