|
![]() |
||||||
![]() |
A league of ITs own? |
||||||
![]() Scientists face a growing problem in to days drug discovery world, namely, how to extract knowledge from the ever- increasing information that has resulted from the genomic revolution. With high-throughput screening technologies and the growing reliance on in silico techniques, the amount of data that scientists must manage, store, mine, and use has grown exponentially. In addition to data management issues, scientists now face information technology (IT) problems. They must decide which toolscomputers, databases, software, servers, and networksto use, and in what manner to use them. They also must find ways to simplify and accelerate drug discovery research. The first step In May 2001, Merck displayed a new strategy by acquiring Rosetta Inpharmatics, a genomics company, for about $600 million. Merck hopes to leverage Rosettas expertise in genomics, gain valuable bioinformaticists, and, most importantly, acquire new drug targets. Although this acquisition looks like a great move strategically, questions remain: Will a new blockbuster drug come out of the $600 million deal? Will this become the new paradigm in the pharmaceutical industry, with one large pharma following in anothers footsteps? We must wait and see. Ultimately, many different strategies will evolve to deal with the rising problem of information. Paramount in importance is the way scientists think about and deal with ITincluding the tools, software applications, genomic and proteomic databases, data integration, and infrastructure (hardware and middleware). For the research scientist, computers were initially used in experimental analysis. Data generated by a scientist would be analyzed and stored in one place. At that time, most scientific discovery was made in vitro. With the advent of networks and the Internet, however, drug discovery now occurs in silico. With in silico techniques, scientists are faced with a new problemhow to access and integrate all the data. Whereas researchers used to work with data in relatively few formats and a limited number of locations, they now deal with heterogeneous data in repositories throughout the world. For example, a scientist might need to access text files on the Internet, assay data in Sybase format at multiple locations, 2-D and 3-D chemical structures in Oracle databases, sequence data in SQL format, and toxicological data, not to mention antiquated legacy data in older databases. For the scientist, this is turning out to be a nightmare. Current IT infrastructure does not always provide simple access to data, and different data types are not easily integrated. According to Adel Mikhail, vice president of strategic development at Lab Book, several companies are aware of these issues and are working to solve them. The days of the large back-end database are over, he says. Products like eLabBook and DiscoveryLink understand this. They allow different types of data to be retrieved from multiple locations and can integrate it all on the fly, right from the researchers desktop. The new paradigm Front Lines finding is echoed by Brian Guza, of First Consulting Groups Discovery Practice (Long Beach, CA), who says, Informatics applications often gather momentum among a small, influential group of end users. Comprehensive informatics products are not currently available; scientific organizations are forced to buy multiple tools to meet their overall needs. By the time that IT resources becomes involved, its too late to evaluate software products against established standards in technology and infrastructure. As a result, many scientists use tools that are poorly integrated or ill-suited for the hardware and infrastructure in their organization. So what does this mean for a pharmaceutical or biotechnology company? Unless an organization plans for its informatics future with its researchers, an inefficient patchwork of applications and content will surround a stagnant infrastructure. Furthermore, although this patchwork may serve individual researchers needs, it will not efficiently meet the organizations goals. The end of the tunnel In 2000, attention shifted from Compaq to IBM when IBM announced a $100 million life-sciences initiative. Under the initiative, IBM agreed to provide solutions in high-performance computing, infrastructure, data management, and integration. After launching the initiative, IBM stated that it would build the most powerful supercomputer in the world. Blue Gene (the appropriately named supercomputer) will tackle predicting the 3-D structure of a proteinthe most difficult problem in life sciences. IBM also announced an agreement with NuTec Sciences and the Winship Cancer Institute at Emory University, both in Atlanta, to develop an integrated information system that will allow physicians to tailor cancer treatment to a patients genetic makeup. Under the agreement, IBM will supply the worlds 10th largest supercomputer along with software for Web application serving, data management and integration, and information portals. In 2001, IBM announced the upcoming launch of DiscoveryLinkthe second prong of its life-sciences initiative, focused on helping the typical discovery scientist. DiscoveryLink is a middleware solution that can mine information with a single query, from heterogeneous data sources using software called a wrapper into what IBM calls a federated database. Specifically, the wrapper translates researchers queries into sorted query language (SQL) queries that can then be used to search for information in DB2 or Oracles database software. According to Sharon Nunes, the person responsible for DiscoveryLink and the director of IBMs Life Science Solutions, We are continually hearing about the problems heterogeneous data are causing life scientists. With DiscoveryLink (and a single query), scientists will have access to all their data streams (structural information, 3-D files, 2-D files, tables, flat files, etc.) in one virtual database. Reflecting Nunes comments, Aventis Pharma has implemented DiscoveryLink to facilitate its drug discovery efforts. More recently, IBM announced collaborations with several life-science specialists. In partnering with LION Bioscience (Heidelberg, Germany), IBM will combine its DiscoveryLink middleware with LIONs SRS integration platform to provide standard-setting information management capabilities. IBM also will provide the IT backbone for Proteome Systems (Woburn, MA) commercial offerings. What does all this mean for life scientists and pharmaceutical companies? According to IBM, it means knowledge discovery. For a pharmaceutical company such as Aventis or Schering-Plough, it means the ability to efficiently mine the data contained in its numerous and often incompatible databases. For a biotechnology company, IBMs approach means a one-stop shop for IT infrastructure. In support of life sciences growing dependence on IT, other key suppliers, including Oracle, Hitachi, Sun, Motorola, and Agilent, have entered the field. In a highly publicized $185 million collaboration, Oracle, Hitachi, and Myriad Genetics have teamed up to map the human proteome (the repertoire of proteins in the human body) in less than three years. The union joins Myriads proteomic capabilities, Oracles software, and Hitachis expertise in electronics technology to improve understanding of the molecular basis for disease. Sun Microsystems (Palo Alto, CA) has committed to the life sciences by forming an Informatics Advisory Council (IAC) and hosting an annual summit meeting. The IAC, a group of IT specialists from academia, industry, and public agencies, was formed to address the data analysis needs of the life sciences community and to discuss the future of standards, visualization, analysis solutions, and hardware and software platform requirements. Commenting on the underlying need for the IAC, Suns Sia Zadeh states, Data integration is the number one challenge in the postgenomics era. Sun has also formed alliances with software developers along the drug discovery value chain under its Discovery Informatics Program, in which it is working to promote the adoption of Java technology, a powerful, cross-platform programming tool that enables data sharing over the Web. The Interoperable Informatics Infrastructure Consortium (I3C) is another group that has recently formed to address the growing pains felt in the life sciences. An international consortium led by major IT players, including IBM and Sun, the I3C now boasts more than 60 participants. Similar to the IAC, the I3C was formed to
However, not all IT companies think alike. Unlike Compaq, IBM, and Sun, the heavyweight Motorola has taken a different approach. Instead of producing the IT infrastructure and software to support informatics, Motorola is making the bio chips, instrumentation, assays, and reagents, such as those used in its CodeLink Bioarray System, that researchers will use to conduct genomic and proteomic research. Motorola has also formed strategic alliances, such as that with the SNP Consortium, through which they will provide genotyping services. Key to Motorolas success is its large size, expertise in manufacturing, and deep pockets. Along the same lines, Agilent Technologies (Palo Alto, CA) has launched several genomics-based products in support of discovery research. The final analysis
Jon Meyer is a consultant and Jim Thompson is a partner at Front Line Strategic Consulting in Foster City, CA. Send your comments or questions regarding this article to mdd@acs.org or the Editorial Office by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036. |
|||||||
Return to Top || Table of Contents |