May 19, 2003
Volume 81, Number 20
CENEAR 81 20 pp. 45-55
ISSN 0009-2347

Integrative approach in which scientists study pathways and networks will touch all areas of biology, including drug discovery



WIRED Systems biology looks at the connections between components in cells. DEPARTMENT OF ENERGY GENOMES TO LIFE PROGRAM

The past decade has seen the ascendance of high-throughput methods for measuring the global expression of different components of the biological landscape--genomics, proteomics, metabolomics. These "-omics" often stand in isolation. But the time has come to pull them together to gain an understanding of biology at a higher level, with its complex collection of networks and pathways.

The approach known as systems biology is where this convergence will happen, although there's not yet consensus on what systems biology actually is. As so often happens when a new term is coming into vogue, it means different things to different people.

Herbert M. Sauro, an assistant professor at Keck Graduate Institute in Claremont, Calif., argues that systems biology is actually not new. "The analysis of networks, regulation, and how the thing works from a whole system point of view has been around for many years. Traditionally, the topic has never been popular because it requires math and computational skills that have never been strong in the biology community," he says. "I think the penny dropped after the human genome was sequenced and people realized that simply knowing the sequence wasn't going to answer many of our questions."

Christophe H. Schilling, vice president and chief technology officer at Genomatica in San Diego, agrees that the idea of systems biology is not new. As he worked on his doctoral thesis, he found that "people had talked about systems biology for quite some time," he says. "I think the difference is that it hasn't been until recently that we've had the experimental tools available to begin to look at biological systems."

Those experimental tools, Schilling believes, have forced people to take a systems approach. "If you're faced with looking at a microarray with thousands of genes on it, it's a pretty harsh reality to look at that and understand that those are a thousand components that are all working together inside a cell."

Tongue in cheek, Adam P. Arkin, an assistant professor of chemistry and bioengineering at the University of California, Berkeley, and a scientist at Lawrence Berkeley National Laboratory, claims that there's no such thing as systems biology. "Being one of the major proponents of it, I can safely say it doesn't exist," he jokes. "I think I can say that because what people mean by this is what people have always done in biology, which is physiology of cells."

Arkin says people got excited by instrumentation that allowed them to study biology at previously impossible scales. "There's a sense you're looking at the entire system, but this, of course, is false. We're not looking at the entire system. We're looking at large pieces of the system."

On a more serious note, Arkin explains that he sees systems biologists as "biologists who are trying in a concerted way to figure out mechanisms theoretically, computationally, and experimentally; to work with systems as large as full genomes; and to ask questions that allow us to get to a level of understanding where prediction, control, and design is feasible."


SYMBOLIC Entelos created a graphical language, shown here in a screen shot of a diabetes model, that both its biologists and computational scientists could understand. COURTESY OF ENTELOS

"BEING A NEW field, a definition is going to evolve and get settled on," says Douglas A. Lauffenburger, professor of biological engineering and a member of the Computational & Systems Biology Initiative at Massachusetts Institute of Technology. "It's going to be defined by what people actually do that's productive and effective."

James W. Fickett, global director of bioinformatics at AstraZeneca, agrees that it's not clear at this point what systems biology will ultimately become. "In some sense, everything that we do from now on will be systems biology. In some sense, it's a trivial concept, but working out the particulars is not trivial."

Hans V. Westerhoff, a professor of microbial physiology at Free University in Amsterdam, takes a slightly contrarian view. "Systems biology is not the biology of systems," he emphasizes. Instead, he says, it is the region between the individual components and the system, which is why it's new. "It's those new properties that arise when you go from the molecule to the system," he says. "It's different from physiology or holism, which study the entire system. It's different from reductionist things like molecular biology, which only studies the molecules. It's the in-between."

Systems biology has both experimental and computational aspects, but some people choose to focus on only one, preferring to define the empirical side as "-omics." For example, Luke V. Schneider, the chief scientific officer of Target Discovery in Palo Alto, Calif., describes systems biology narrowly as "mathematical modeling of biological systems."

Similarly, Colin Hill, president of Gene Network Sciences, Ithaca, N.Y., defines systems biology as "the creation of data-driven computer simulations that explain and predict how the basic components of the cell interact to give rise to the physiology of the cell and eventually the physiology of organs and tissues." Hill stops short, however, of restricting his definition solely to computational models. "It's a hybrid experimental-computational approach," he says.

The one thing that everyone agrees on, however, is the multivariate nature of systems biology. "You have to be looking at multiple variables simultaneously and how they interact with one another, rather than any specific single variable in isolation," Lauffenburger says.

One of the things that has made systems biology possible is the ability to make high-throughput measurements of DNA, RNA, and proteins.

"To really understand systems, you have to capture global data sets from each of those levels and then integrate them together if you're to get a coherent understanding of the system," says Leroy E. Hood, cofounder and president of the Seattle-based Institute for Systems Biology. "We're learning from systems biology that the more different types of data you can integrate together, the deeper the insights are into the biology of the system you're studying. The role of generating global data sets is absolutely essential."

Lauffenburger believes that measurement technology--and the ability to get quantitative data from many different conditions--is the biggest bottleneck in systems biology. He believes that moving past this bottleneck will require further automation of current methods and the invention of new devices, especially microfluidic devices.

Systems biology has been called a return to hypothesis-based research and a repudiation of discovery- or data-driven research, in which it was thought that computational tools would yield answers simply by mining large quantities of data. "You can't afford to generate enough data to do a statistical data mining kind of approach, which is hypothesis-free experimentation," Schneider says.

However, Lauffenburger thinks that systems biology is actually a combination of hypothesis-driven and discovery-driven research. "If you start from complex, multivariate data, it's more difficult to formulate hypotheses than it used to be. There is an aspect of discovery science in it, because you're trying to elucidate some hypotheses. This is where informatics and data mining come in," he says. "One of the nice things about systems biology is that it allows you to transition seamlessly between discovery and hypothesis science."

Sauro considers systems biology to be a three-legged stool, consisting of experimentation, computation, and theory. "The three of them together combine to give you a powerful set of tools for understanding systems. If you take any one of them out, I think the thing is much less than all three together," he says. "It's really the theory that binds the computation and the experimentation together. I worry that we're going to end up with just experimentation and modeling, and that won't be enough. The models will get too complicated, and you will have no way of interrogating the model in a sensible way to understand the system."

Lauffenburger refers to the "four M's" of systems biology: measurement, mining, modeling, and manipulation. "Manipulation and measurement are on the experimental side. Mining and modeling are on the computational side," he says. These four M's are part of an iterative process, beginning with manipulating the system. Once a system is perturbed, it is measured using a high-throughput, multivariate technology. The data are then mined to elucidate hypotheses that, when cast in terms of formal computational models, form the basis for a new manipulation of the system.

"I see the biggest impact that systems biology is going to have is in fundamentally changing the success rate in clinical trials."

ARKIN CONSIDERS computation to be the practical application of theory, but he believes that experimental data are necessary to keep theory realistic. "The thing about theory and computation is that they are very compelling until an experiment is done," he says. "A theorist who is constantly generating theory without experimental feedback is in danger of migrating away from reality, especially in biology."

Schneider also believes that a combination of computation and experiment is vital. "If you generate a model in a vacuum and don't make a prediction that you can then test, you don't know if a model is right or wrong," he says. "If you generate data without a hypothesis, in some respects you can prove anything you want. Mathematical models, if you do them right, become hypotheses that are testable."

Schneider advocates borrowing methods from engineers, particularly from electrical and chemical engineering. "The key is the unit-operation approach. You have to define the physiological unit operations," he says. "On a fundamental basis, there's really nothing different between unit operations for resistors, capacitors, and induction coils than for Michaelis-Menten kinetics, membrane transport, and chemical equilibria. They're just equations that apply."

One challenge of modeling biological systems is that most biological data are "differential display" data, dealing with changes between states. "It's not measuring an absolute level," Schneider says. "This is where we went back into chemical engineering and process control theory and figured out you can do some really nifty linear algebra. Then you don't have to worry about absolute numbers anymore. This linear algebra trick took us out of having to do numerical integration into doing algebra problems."

BECAUSE MODELING is an integral part of systems biology, many computational tools are required. Sauro complains that people keep rewriting software. "They keep writing new tools that basically do the same thing. There's a whole plethora of tools available now, and none of them talk to each other," he says.

The systems biology markup language (SBML) is meant to address that problem. SBML is a computer text language that facilitates the transfer of computational models. People from around the world participated in the development of SBML. The project was conceived by Hamid Bolouri, now at the Institute for Systems Biology, and Hiroaki Kitano, head of the ERATO Kitano Symbiotic Systems Project. Sauro participated in the project as a visiting associate at California Institute of Technology. Another part of the project is the Systems Biology Workbench, which is intended to foster the reuse of software and to allow people to build upon what has already been created. The project is funded by a grant from the Japan Science & Technology Corporation. The software products are freely distributed via the website

Because of its many facets, systems biology attracts scientists from a variety of disciplines, including the basic sciences, engineering, and computer science. Hood sees the field as needing "the integration of a cross-disciplinary group of scientists working together." In fact, he believes that future scientists will have to become familiar with multiple subjects.

"All biologists should really think in terms of two subjects," Hood says. "If you're a biologist, you should also think about computer science or applied mathematics or engineering. I think everybody ought to learn biology, plus either a quantitative computational skill or a physical skill. I'm very much attracted to a dual mentorship idea."

Arkin's group, which includes 33 people from 12 departments and five institutions, exemplifies the multidisciplinary nature of systems biology. "They're very diverse, and they're forced to hang out together," he says. "For four to six years, they're in a laboratory in which there are people who are mechanical engineers, bioengineers, chemical engineers, physicists, biologists, mathematicians, statisticians. They learn the language, but each one is trained in his or her own department."

Bringing together individuals from different backgrounds requires finding ways to breach language barriers. For example, the research departments at both Entelos, Foster City, Calif., and Gene Network Sciences are divided approximately equally between biologists and engineers.

"Biologists' eyes tend to roll back when they see an equation, and engineers get bored to tears reading through the detailed protocol of a scientific experiment," says Thomas Paterson, chief scientific officer at Entelos.

Entelos and Gene Network Sciences have each developed a graphical language that can convey the computational models to both their biologists and their engineers. Having a graphical language "allows our biologists to see these complex biological pathways at different levels of aggregation in a graphical way that is more intuitive to them--to make sure that the biology is being represented the way they think is appropriate," Paterson says. "For our engineers, each bubble and arrow has a particular mathematical translation."


CONNECTIONS This screen shot shows a portion of the pathways in which the p53 protein is involved, displayed in Gene Network Sciences' Diagrammatic Cell Language. COURTESY OF GENE NETWORK SCIENCES

A MAJOR CHALLENGE in systems biology is gathering different kinds of information in one place where they can then be used for computation. Ingenuity Systems, based in Mountain View, Calif., has spent five years assembling its "knowledge base" of how various genes and proteins in cells and diseases are related to one another. Most of the content of its knowledge base comes from the public literature, but its structure also accommodates proprietary information. Ingenuity uses an analogy of a map to describe what it has done.

"If you think about a map, the Human Genome Project just identified the things on the map but didn't place them in relation to each other," says Frank Mara, senior vice president of marketing at Ingenuity. "We lay out the street map and do the connections between your office and the grocery store and the park and your home and other people's homes. This allows biologists to now drive through the human genome to discover novel insights and connections in a systematic way."

The content was developed by delving into the scientific literature and pulling out information about pathways. "We look at lots of different sources where proteins have been characterized with respect to what they do with each other and with processes, cells, and tissues," says Raymond Cho, vice president for genomics at Ingenuity. "We've created a single language for representing all those interrelationships."

AstraZeneca's Fickett points out that most of what is known about protein interactions is contained not in databases or mathematical equations but in the text of the scientific literature. "We want to figure out how to get that information out of the literature and into the hands of our scientists when they're trying to make decisions about which direction to take with a drug discovery program," he says. "We're doing quite a bit of work on categorizing scientific articles according to what molecular processes might be discussed in the text."

MANY COMPANIES that specialize in systems biology are building computational models. For example, Gene Network Sciences develops simulations of cells. "Our mandate has been to exploit genomics, proteomics, and molecular biology data to create the world's most complete, sophisticated, and accurate computer simulations of human cells and bacterial cells," Hill says. The company, which has specialized in oncology and infectious diseases, currently has models of a colon cancer cell and Escherichia coli bacteria.

E. coli models can have applications in improving fermentation processes or identifying drug targets. "Because there are now more than 50 bacterial genomes sequenced, much can be done with comparative genomics using the E. coli model as a basis for creating simulations of other microbes," Hill says. Gene Network Sciences is modeling a "minimal cell," he continues, which "represents the basic core functioning of any microbe, of any bacteria species, and is used as a way to understand what is the basic functioning of a cell and therefore how we can design better antibiotics.

"These computer simulations are used to test out potential drugs on the computer before they're tested in animals and before they're tested in clinical trials. In addition, we use these simulations to identify high-value drug targets and combinations of these targets that we can determine are nontoxic and efficacious," Hill says.

Although much of Gene Network Science's approach involves computation, Hill doesn't classify the company as a software company. In fact, the company also performs experiments to refine its models. "It's the tight coupling between experimental biology and very sophisticated computational modeling methodologies that ultimately drives systems biology."

Entelos also focuses on modeling, taking what Paterson calls a "top-down" approach. "We start with the high-level system phenomenon and work down," he says. "The end point that we care about is not a protein-protein interaction. It's not even how a cell behaves in culture. It's how the integrated human system is going to behave." Entelos wants to understand the clinical end points of disease.

The challenges associated with the top-down approach are the same as those in reverse engineering, according to Paterson. In particular, there are often going to be gaps in the knowledge of the system.

"We're very much driven by mapping out what we don't know," Paterson says. "In the areas that we don't know things, where we have knowledge gaps, we have a systematic procedure where we formulate multiple competing hypotheses and then test those hypotheses mathematically to see if they are consistent with the overall data. In many cases, it allows us to triangulate on the right answer."

Paterson believes the top-down approach is particularly suited to modeling complicated diseases. "When you have phenomena at the clinical level that are very complex, that gives us a large number of constraints that are valuable for helping us to reverse engineer those knowledge gaps where we don't have a lot of understanding," he says. Entelos is constructing models for diseases such as asthma and diabetes.

Academic researchers also are building models. For example, Westerhoff and his colleagues are building computer models that they call "silicon cells." These models include the interaction and kinetic properties of the cell systems. The metabolic models can be viewed and used to run simulations at

In one success story, Westerhoff and his colleague Barbara M. Bakker used computer models of glycolytic pathways in Trypanosoma brucei, a parasite that causes African sleeping sickness, to discover an unexpected drug target. "We found that the glucose transporter is the best predicted drug target, rather than glyceraldehyde-3-phosphate dehydrogenase, which is the drug target that is worked on most," Westerhoff says.

They also used the models to understand why T. brucei needs its unusual organelle known as a glycosome. "They are one of the few organisms that have their glycolytic enzymes in an organelle. We didn't even know if this organelle was essential," Westerhoff says. To find out, they removed the organelle's membrane in the computer model.

"We saw that the whole thing would explode," Westerhoff says. "Rather than making pyruvate from glucose as it should and excreting the substance, these trypanosomes would now make a large amount of a substance called fructose-1,6-bisphosphate, which is not excreted. Basically, the cell would accumulate the substance inside, and it would swell and ultimately burst." Experiments are currently being conducted to see if the predictions are correct.


SIMULATE The "silicon cell," found online at Free University, displays the results of simulations of different metabolic pathways. COURTESY OF HANS WESTERHOFF

SYSTEMS BIOLOGY has the potential to impact a wide range of biological research. For example, the Department of Energy has started a program called Genomes to Life, which emphasizes systems biology. The program will focus on applications in environmental cleanup and new energy sources.

Arkin is a major recipient of funding from the Genomes to Life initiative. He directs a program called the Virtual Institute of Microbial Stress & Survival at Lawrence Berkeley National Laboratory. The project will predict the responses of microbes to environmental conditions at contaminated waste sites, with the goal of accelerating cleanup.

In another approach, Genomatica focuses on modeling metabolism, which is the process by which cells gain energy for all other functions. "We feel that metabolism offers a logical starting point for the development of complete cellular models and also complete holistic models of multicellular organisms or whole-body models," Schilling says.

The company takes a "constraints-based" approach to metabolism. "The emphasis is on trying to place constraints on metabolism based on physical and chemical laws that govern all systems," such as conservation of mass or energy, Schilling says. These are considered "hard" constraints because they apply equally to all systems. Additional system-specific constraints are provided by the repertoire of reactions that an organism's genome makes available to it. "Based on the limitations of what reactions are available, the stoichiometry of those reactions, and the thermodynamics associated with the reactions, we can further limit what's possible by the cell and by metabolism," he says.

Genomatica has concentrated on microbes such as E. coli and yeast. The models can be used to predict the performance of an optimally designed microbial strain. Genomatica researchers have found they can use selective pressure to force the organisms to evolve to reach the optimal state, which could have uses in designing microbial strains for bioprocessing and in finding ways to overcome antibiotic resistance.

Unlike many other systems biology companies, Beyond Genomics in Waltham, Mass., is emphasizing the experimental aspects. "I view Beyond Genomics' systems biology as an outgrowth of measurement technologies with a sophisticated overlay of bioinformatics," says Robert N. McBurney, senior vice president for research and development and chief scientific officer. "The in silico stuff is the back end. If you don't have a good biological or clinical experimental design, the back end is completely useless. If you don't have high-quality samples, the back end is useless. If you don't really know what you're doing with your instruments, the back end is useless. I think Beyond Genomics' strength is that we're not an in silico shop."

"There's a sense you're looking at the entire system, but this, of course, is false."

SYSTEMS BIOLOGY has the potential to impact the entire drug discovery and development timeline.

"I think systems biology will affect everything we do," Fickett says. "In the context of drug discovery, it's about the connection between the molecular and the physiological. When people look back at these decades from a later viewpoint, they'll see that this was the time when molecular physiology really took off.

"Pathway and systems information is used at every stage of R&D," he says. "At very early stages when you're trying to pick out the next target you may work on, gene expression results are very commonly part of that decision-making. Overlaying expression information on pathways or bringing in protein-protein interaction information from the literature and connecting it to expression results is important when we're picking targets at the beginning of the pipeline."

"Unlike other single-platform technologies developed in the past, a systems biology approach can impact every part of the value chain," McBurney says. "If you go to senior R&D executives in the pharma industry and ask them where they need help, it's not actually in target finding, which is something that systems biology can do very well. It's more in that phase of the transition from preclinical development through Phase II clinical trials."

For example, biomarkers for efficacy and toxicology could lead to more efficient preclinical development. Beyond Genomics is integrating proteomics and metabolomics to find surrogate markers for efficacy and toxicology. "I would say there are more diseases for which we have no surrogate measures for drug efficacy than there are diseases like atherosclerosis, for which we have cholesterol," McBurney says.

"When a lot of the first '-omic' technologies came out, one felt that by understanding the gene or the genes that were involved in a disease, you essentially had the way paved for identifying better, safer, more effective targets for drug discovery," says Thomas Colatsky, vice president for health care research at Paradigm Genetics in Research Triangle Park, N.C. "There's been a growing awareness that some of the same risks that always existed in drug discovery and development still exist. The systems biology approach puts all that information in the context of what happens to the entire organism."

"Not only will you identify a disease target to start your drug discovery efforts, but that target will be put in biological context," Schneider says. "You'll understand the pathway it's involved in. You'll understand the metabolic fluxes through that pathway and how they're altered in the disease state. And you'll have a mathematical model you can use as a predictive tool."

Hill believes that systems biology will be useful in screening for combination drug targets. "We know that many complex diseases are not caused by a single gene going wrong. They're not treated effectively with a single drug," he says. "When we start looking for multitarget approaches to disease, the screening of genes to find combinations mandates the use of a data-driven computer simulation."

In addition to finding targets, systems biology will help stratify patient populations, leading to more personalized medicine. So far, this type of stratification has focused on genetic differences, as in pharmacogenomics, but other types of information will also be incorporated.

"What causes a patient to respond differently to a therapy is a functional difference in the pathophysiology of the disease or a functional difference in certain aspects of metabolism that may affect the characteristics of a particular compound," Paterson says. "A genotype is one piece of data that you can collect to let you know that one patient has this functional difference. If it's the functional difference that matters, there may be many genes involved in regulating that function."


EXPERIMENT The ideal setup in a systems biology laboratory allows a combination of experimentation and computation. BEYOND GENOMICS PHOTO

PATERSON POINTS out that if just one polymorphism leads to that functional change, then genetic testing makes sense. However, for more complicated pathways, genetic screens won't be particularly helpful. "What you really want to do is find a physiologic biomarker or pattern of biomarkers that helps you understand that this particular patient has this functional difference," he says.

Systems biology can also be used to elucidate side effects in drugs, Lauffenburger suggests. "Right now, you think you're designing an inhibitor to block some particular enzyme in a pathway. Viewing how that might work is okay if these pathways are all linear, but they're not," he says. "They're highly connected into networks with lots of nonlinearities and feedbacks. Sometimes the effect of blocking a particular enzyme can really surprise you."

Paterson hopes that systems biology will help the pharmaceutical industry become more like other R&D-intensive industries, such as the aerospace, automotive, and electronics industries. "They make good use of simulation technologies before they actually build costly prototypes, the equivalent of going to a clinical trial," he says. In contrast, he says, pharmaceutical companies often have only their hypotheses about human efficacy to guide them before Phase II clinical trials, making a large portion of the process trial and error.

"The aerospace and automotive industries abandoned trial and error a long time ago. By the time they actually get to driving a prototype, it's all confirmatory," Paterson says. "In the pharmaceutical industry, the cumulative failure rate past Phase I is almost 75%. Even after toxicology in Phase I, 75% of the time the hypothesis about how the drug is going to affect clinical end points is wrong. I see the biggest impact that systems biology is going to have is in fundamentally changing the success rate in clinical trials, particularly Phase II and beyond."

Systems biology can even be used with drugs that are already on the market to help find alternative indications. Ingenuity is working with the nonprofit foundations ABC2, which focuses on brain cancer, and CaP CURE, which targets prostate cancer, to determine what pathways are involved with these types of cancers and what drugs already on the market might target those pathways.

"My sense is that down the road 10 or 20 years from now, systems biology will be an integral part of the approach to drug discovery that will create a more efficient process," Schilling says, "but it won't be an approach that's surrounded by a lot of hype and fanfare." Instead, it will be in the background and the discoveries that it makes possible will be in the spotlight.


Chemical & Engineering News
Copyright © 2003 American Chemical Society