|
|
|
|
Researchers use a variety of tools to probe protein function and interactions, with drug discovery the major goal |
|
|
Stu Borman
C&EN Washington |
|
With the Human Genome Project rapidly approaching closure, researchers are turning increasingly to the task of converting the soon to be completed DNA sequence into information that will potentially improve--and perhaps even revolutionize--human medicine and health care. One of the key challenges ahead: understanding proteomics, the science of the cellular protein universe.
Proteomics, in a sense, is more complex than genomics, not only because there are so many possible interactions among proteins, but because there are so many more proteins than genes. Messenger RNAs--transcripts of genomic DNA that directly encode proteins--can be assembled in many different ways. And expressed proteins are typically modified in a variety of ways, such as by phosphorylation and glycosylation. So organisms may have well over an order of magnitude more proteins than genes.
Proteomics attempts to catalog and characterize these proteins, compare variations in their expression levels under different conditions (notably sickness versus health), study their interactions, and identify their functional roles. Scientists believe there's a powerful distinction to be made between the molecular function of an isolated protein and the function of that protein in the complex cellular environment.
Proteomics is not the study of proteins one by one, as has traditionally been done, but in an automated, large-scale manner. That requires new technologies and techniques, and considerable effort is currently being devoted to the development of novel tools of the proteomics trade.
Such efforts were very much in evidence at "Proteomics," a conference held in San Francisco last month. The meeting was the third of three conferences in a series titled "Beyond Genome 2000" that was sponsored by Cambridge Healthtech Institute, Newton Upper Falls, Mass.
Asked to define the scope of proteomics, biochemist and molecular biologist Richard R. Burgess of the McArdle Laboratory for Cancer Research at the University of Wisconsin, Madison, a speaker at the conference, tells C&EN that it is "a grab bag of activities that are all in the postgenomics or functional genomics area, or the what-do-we-need-to-know-to-make-sense-of-all-the-genomics-data arena."
The field includes the following:
Transcriptional profiling to determine "which genes are transcribed into RNA in a particular cell type, developmental stage, or disease state," Burgess says.
High-throughput expression and purification of proteins.
Protein profiling, the use of two-dimensional gel electrophoresis and mass spectrometry to study the proteins expressed in a cell.
Protein-protein interaction studies to see which proteins function together, primarily using a technique called the yeast two-hybrid method.
Pathway analysis to understand signal transduction and other complex cell processes.
Large-scale protein folding and 3-D structure studies.
Bioinformatics analysis of proteomics data.
"The major concept I find to pervade the area of proteomics is that you don't just study one protein, as we have been doing for years, but you study all the proteins in an organism at the same time," Burgess says. "From my perspective, there is and will continue to be incredible creativity in devising ways to do this."
Discovering protein function
The rationale for protein-protein interaction studies is that proteins that bind together tend to work together--or at least are more likely than not to be functionally related. The idea that affinity and functionality go hand in hand might seem questionable, but this hypothesis has been verified experimentally.
"Empirically, the reason we are interested in protein-protein interactions is that if you can delineate a protein that's involved in a disease process it's quite likely that that is a potential drug target," said research scientist Daniel Wettstein of Myriad Genetics, Salt Lake City, who was a session chairman at the Proteomics conference.
|
|
In the yeast two-hybrid method, binding between a bait protein (conjugated to DNA-binding domain) and a prey protein (conjugated to transcription-activation domain) brings the two domains into proximity, forming a functional transcription factor that turns on a reporter gene. Expression of that gene shows that the bait interacts with the prey, indicating they may be functionally related. |
|
|
|
Currently, the primary technique for identifying protein-protein interactions in biological systems is the yeast two-hybrid method, "a genetic method for the analysis of protein-protein interactions," Wettstein explained at the meeting. "Its power in terms of identifying protein-protein interactions comes from the fact that you can run it as a selection. You can survey millions of different protein-protein interactions in yeast, and only those yeast that harbor a productive protein-protein interaction will grow. If they grow, they form colonies, you can pick them, and you can analyze them."
The two-hybrid technique exploits eukaryotic transcription factors that consist of two modular domains: a DNA-binding domain and a transcription-activation domain. In the technique, two fusion proteins are made. The first is a fusion between a DNA-binding domain and a bait protein. "Typically, the bait is a protein you have in hand, and you're interested in determining protein-protein interactions that it participates in," Wettstein said. The second is a fusion between a transcription-activation domain and a prey protein, a potential interacting partner.
Both fusions are expressed in the presence of a reporter gene. If bait and prey do not interact, the reporter gene is not turned on. If they do interact, the activation domain and DNA-binding domain are also brought together, forming a functional transcription factor that induces expression of the reporter gene. The system is often set up so yeast growth is dependent on expression of the reporter gene--that is, yeast grow only when bait and prey interact.
Since the basic two-hybrid technique was developed in the late 1980s, scientists have improved and modified it extensively. "There are many examples of conditional protein-protein interactions, where, for example, binding of two proteins to each other is dependent on phosphorylation of one of the proteins," Wettstein said. "You can also uncover protein-protein interactions that are dependent on binding of a small-molecule ligand or the presence of a protein or RNA that forms a bridge between bait and prey. The original scope of the yeast two-hybrid screen has been broadened by these modifications."
Erica A. Golemis and coworkers at Fox Chase Cancer Center, Philadelphia, recently developed a dual-bait variation of the yeast two-hybrid system that analyzes two independent protein-protein interactions simultaneously [J. Biol. Chem., 274, 17080 (1999)]. At the Proteomics conference, Ilya Serebriiskii, a research associate in Golemis' group, explained that the dual-bait system excels at distinguishing specific from nonspecific interactions. For instance, it can be used to identify two discrete sets of interacting proteins against an extensive background population of nonspecifically interacting proteins.
Perhaps the most notable recent yeast two-hybrid study was use of the technique to create the first protein-protein interaction map of an entire organism--that of yeast, appropriately enough. The work was carried out by University of Washington genetics professor and Howard Hughes Medical Institute investigator Stanley Fields (original developer of the yeast two-hybrid method); Chairman, President, and Chief Executive Officer Jonathan M. Rothberg of CuraGen, New Haven, Conn.; and coworkers [Nature, 403, 623 (2000)].
The researchers detected 957 possible protein-protein interactions involving 1,004 yeast proteins. In the study, the yeast two-hybrid method was used in conjunction with CuraGen's PathCalling technology, an automated high-throughput system for analyzing the roles and relationships of genes and proteins in biological pathways. With the protein-protein interaction map of yeast in hand, "now we are doing Drosophila," said CuraGen bioinformatics director James Knight. The Drosophila work is being done in collaboration with the groups of Gerald M. Rubin, professor of genetics and development and Howard Hughes Medical Institute investigator at the University of California, Berkeley, and Russell L. Finley Jr., assistant professor of molecular medicine and genetics at Wayne State University School of Medicine, Detroit.
Interaction maps and sites
Once protein-protein interactions have been identified, the network of relationships is easier to visualize if the relationships are mapped out graphically. A number of companies have developed sophisticated systems for interaction mapping and analysis.
Myriad Genetics has set up ProNet Online (http://www.myriad-pronet.com), a website that displays data on human protein-protein interactions derived from yeast two-hybrid stud-ies reported in the published literature. Entries include lists of interacting partners, Java-based interaction maps, and direct links to other protein data in publicly accessible databases.
Paris-based Hybrigenics also provides maps of protein-protein interactions discovered in yeast two-hybrid screens (http://pim.hybrigenics.com/pimriderlobby/PimRiderLobby.htm). Each interaction is evaluated and given a reliability score, ranging from A (a highly reliable interaction) through E (a probable artifact), so scientists can focus on the most important ones. The interacting domain of each protein is identified and mapped.
Knowing precisely which domain interacts is a good start for drug modeling studies, said Frederic Allemand, vice president for business development and marketing at Hybrigenics. He noted that Hybrigenics researchers recently analyzed the interactions and functional relationships of proteins encoded by the genome of the gastrointestinal pathogen Helicobacter pylori and found that each H. pylori protein participates in 3.6 interactions on average.
A strategy for zeroing in on specific sites where protein-protein interactions occur was developed by Burgess and coworkers at the University of Wisconsin. The researchers used labeled protein probes to detect fragments containing interaction domains of interest.
Burgess described the identification of a 49-amino acid site on the Escherichia coli RNA polymerase ' subunit at which the transcription factor 70 binds--an interaction that induces the enzyme to catalyze DNA transcription. Prior to this study, little was known about the location of the 70 binding site on the multisubunit enzyme. The researchers constructed a radiolabeled 70 probe and used blotting experiments to determine which fragments of ' the probe binds to and which it doesn't bind to, thus narrowing the exact range of residues at which the interaction occurs.
Burgess says he and his coworkers are now developing a more highly refined binding model and carrying out high-throughput screening studies to identify compounds capable of inhibiting this interaction. Such agents would represent possible leads for the development of novel antibiotics. "A small molecule that binds and inhibits the interaction of 70 with ' would prevent RNA synthesis in the bacteria and thus prevent growth of the bacteria," Burgess tells C&EN.
|
|
Burgess and coworkers at the University of Wisconsin have identified the binding site for an important protein-protein interaction--one between the ' subunit of bacterial RNA polymerase and the transcription factor . This binding interaction initiates gene transcription and protein synthesis. |
|
|
|
Informatics scientist Warren L. DeLano and President and Chief Scientific Officer James A. Wells of Sunesis Pharmaceuticals, Redwood City, Calif., in collaboration with researchers at Genentech, South San Francisco, recently studied a site on the constant region of antibodies that binds to four natural proteins to see what properties caused it to be so active [Science, 287, 1279 (2000)]. Affinity selection experiments showed that peptides, proteins, and small molecules can, in fact, all share this type of common binding site on protein surfaces.
The researchers found that this class of promiscuous sites is nonpolar, high-ly accessible to potential binding part-ners, structurally adaptive, and relatively hydrophobic in character. DeLano noted that the study provides "a map you could use either for structure-based design or combinatorial library design. We hope success in the design of small-molecule protein-protein antagonists is going to come through identification of such sites and the development of chemical groups and chemical technologies that will allow us to specifically target these sites."
2-D gel analysis
For the past two decades or so, 2-D gel electrophoresis has been the predominant technique for analyzing the protein constituents of whole cells and cell organelles. In this technique, proteins are separated in one dimension on the basis of charge and in a second dimension based on molecular size. Individual proteins on the gel can then be isolated and characterized.
Unfortunately, gels are notoriously difficult to analyze. The resolving power of 2-D gel electrophoresis is often insufficient to separate all the different proteins in a sample. In addition, the reproducibility of the technique is poor, making it difficult to detect differences in protein expression reflected in two different gels. Lacking bioinformatics tools, the best a researcher can generally do is to place gels side by side and attempt to match up corresponding protein spots visually--a laborious, error-prone process.
However, a number of companies have introduced software that greatly facilitates 2-D gel analysis. These programs include Melanie (Geneva Bioinformatics, Geneva, Switzerland, and BioRad Laboratories, Hercules, Calif.); PDQuest (BioRad); ImageMaster (Amersham Pharmacia Biotech AB, Uppsala, Sweden); Phoretix 2D (Phoretix International, Newcastle upon Tyne, England); Gellab (Scanalytics, Fairfax, Va.); and Kepler ( Large Scale Proteomics, Rockville, Md.). Such programs generally automate the alignment of spots on one gel with corresponding spots on another, facilitating electrophoretic analysis.
At the San Francisco meeting, Zeev Smilansky, vice president of proteo-mics at Compugen, Tel Aviv, Israel, described a new program of this type, called Z3. "In Z3, some ideas from modern moving-image processing were brought in to help solve the problem of automated registration of 2-D gel images," Smilansky tells C&EN. "Today, the system is in beta testing in about 30 laboratories around the world. In some of them, the advantages have been substantial, and throughput has in some cases improved 10-fold or more." The program is scheduled for full commercial release later this year.
And Andreas Hohn, of the business development and marketing unit of GeneData A.G., Basel, Switzerland, described a 2-D gel analysis program called GD Impressionist that imports data from other gel programs, such as Kepler, Melanie, and Phoretix 2D. It evaluates the quality of spot matching achieved by these other image analysis programs and uses statistical algorithms and graphical tools to calculate and visualize patterns in the data, the goal being to provide more meaningful protein expression information to the experimentalist.
Protein expression profiling
One of the major applications of 2-D gel analysis is protein expression profiling, the study of expressed proteins in different cell types. Researchers often compare protein expression in diseased versus normal cells to identify potential disease-related proteins.
Assistant professor of human genetics Yingming Zhao and coworkers at Mt. Sinai School of Medicine, New York City--in collaboration with the National Cancer Institute-Food & Drug Administration Tissue Proteomics Initiative led by FDA microbiologist Emanuel F. Petricoin III and NCI Laboratory Chief Lance A. Liotta--are using expression profiling to study the mechanisms of prostate, esophageal, breast, colon, lung, and ovarian tumor growth. The researchers use laser-capture microdissection to obtain cancer cells and normal cells from tissue samples, 2-D gel electrophoresis to separate all the protein components of those cells, and capillary high-performance liquid chromatography/electrospray ion-trap MS to identify proteins whose expression levels differ in the two types of cells.
Proteins expressed at higher levels in cancer cells are potentially disease specific. According to Zhao, cancer-specific proteins so identified "provide clues on the understanding of the molecular basis of cancer progression, can serve as targets for therapeutic intervention, and are invaluable for developing new methods for early diagnosis and detection of diseases."
|
|
|
|
Comparisons of protein expression used to require side-by-side visual analysis of differences in spot patterns on protein gels- a laborious process. Computer programs can now do this more easily. Differential protein expression in two bacterial gels grown under different condistions (top) is shown in a software-generated analysis (bottom) where spots overexpressed or expressed only under standard conditions appear green, spots underexpressed or expressed only under phosphate-starvation conditions appear magenta, spots representing proteins expressed in similar amounts appear gray, and spots for proteins expressed in dissimilar amounts have black centers. Each spot generally represents a single protein. The analysis shown here was carried out with Compugen's Z3 software. |
|
Biochemist and group leader Stephen T. Rapundalo and coworkers at Pfizer Global Research & Development's Ann Arbor Laboratories, Ann Arbor, Mich., are studying expression patterns of cardiovascular proteins in animal models of cardiovascular disease to gain a better understanding of heart disease and other conditions. "Proteomics offers a great tool to validate compounds in various stages of the drug discovery pipeline and in various models of efficacy or treatment," Rapundalo said. "As the technology progresses, we will continually need to assess techniques that can enhance the throughput and reproducibility of our studies."
Phages and antibodies
One technique making an important contribution to proteomics is phage display. In phage display, peptide or protein libraries are created on viral surfaces and screened for activity en masse. The peptides or proteins remain associated with the genes that encode them, making them easy to identify.
"It's pretty clear to everyone that the conventional way of analyzing potential disease genes on a one-at-a-time basis is not acceptable," said Scott Chappel, senior vice president of research at Dyax, Cambridge, Mass. "What we need to have is a process that is going to be able to--hopefully rapidly, in an automated sense--aid in purification of the gene product, help identify its function, validate it as a target, and determine its role in the disease process so maybe it can itself become a therapeutic."
Phage display fills this need, he said. It can be used routinely to determine "binding molecules to hundreds of target proteins in a matter of weeks," Chappell noted.
|
|
[Image copyright Dyax Corp.] |
|
|
|
Researchers at Phylos, Lexington, Mass., have developed a technology similar to phage display called Profusion. Profusion molecules are conjugates in which a peptide or protein is linked chemically to the mRNA that encodes it, making it possible to easily identify the peptide or protein in affinity screening experiments. The technology is based largely on work by Richard W. Roberts, now assistant professor of chemistry at California Institute of Technology, and molecular biology professor Jack W. Szostak of Massachusetts General Hospital, Boston [Proc. Natl. Acad. Sci. USA,94, 12297 (1997)].
The Profusion technology is simpler than phage display because no living viruses are involved, and it potentially can be used to create larger and more complex peptide libraries than those accessible with phage display, according to Phylos research scientist and conference session chairman Thomas Cujec. Profusion molecules can be used to identify ligands for important receptors, to study novel protein-protein interactions, to find key enzyme substrates and transcription factors, and to identify drug targets. Phylos scientists recently demonstrated that several known substrates of the tyrosine kinase v-abl could be selected from a proteomic library of RNA-protein fusions.
Meanwhile, scientists at Biovation, in Aberdeen, Scotland, are developing a technology called Biodisplay in which mass spectrometry is used to identify antibody fragments that bind human proteins. The technique simplifies the detection of antibodies that bind the targets by tagging each antibody fragment with a peptide encoding sequence or "barcode."
Biovation research manager Fiona Adair explained that the fragments so identified can be used to test tissue samples for the presence of corresponding target proteins. The function of those proteins can then be studied and their relevance as therapeutic or diagnostic agents determined.
Chips and arrays
A range of chip-based and array-based technologies are also emerging for the identification and characterization of individual proteins and for the profiling of protein expression in cells. At Ciphergen Biosystems, Palo Alto, Calif., for instance, researchers have developed a chip-based strategy "that allows biologists to be able to do protein chemistry and protein analysis and incorporate that directly into their biological research," said senior scientist Enrique A. Dalmasso.
In the technique, called ProteinChip, a crude biological sample is placed on an array capable of capturing a subset of proteins from the sample. The array surface can be a chemical surface with broad specificity so that it catches whole classes of proteins--perhaps hundreds or thousands of them at the same time. Or it can be extremely specific, such as an antibody-, enzyme-, or receptor-based surface that is highly selective for a few proteins. After the capture step, the ProteinChip array is washed to reduce nonspecific binding, and retained proteins on the surface are analyzed by laser desorption/ionization time-of-flight MS.
Molecular Staging, Guilford, Conn., has adapted a signal amplification method called rolling circle amplification for high-throughput protein expression analysis on biochips. Expression proteomics with rolling circle amplification appears to have significantly greater sensitivity and to be more scalable than with 2-D gels and mass spectrometry, according to Stephen F. Kingsmore, chief operating officer at Molecular Staging.
Prototypes of two rolling circle amplification protein chips were recently used to measure allergen-specific immunoglobulin-E and 48 growth factors in patient samples. The chips had sensitivities of about 1 picogram per mL and a dynamic range of five orders of magnitude. "It takes us three to four weeks to custom design chips for 20 to 50 proteins of interest," Kingsmore said.
Bioinformatics
Sophisticated bioinformatics tools to handle the plethora of data emerging from proteomics studies are also under development. One company, Proteome, Beverly, Mass., has developed a collection of proteomics databases for model organisms, such as yeast, the fungal pathogen Candida albicans, and the worm Caenorhabditis elegans. The databases contain indexed biological information on proteins that is useful for annotating and interpreting experimental results from proteomics studies. A subset of the databases is free to academic users, and a subscription version of the databases with enhanced features is available for corporate researchers. Proteome scientists believe the interconnectedness of its databases will aid in assigning function to a large number of unknown genes.
Molecular Simulations, San Diego, offers programs such as Modeler, a module of the company's Insight II molecular modeling suite. Modeler builds structural models for proteins that have known sequences but for which 3-D structural data are unavailable. Molecular Simulations has a database of computationally derived protein structures called AtlasBase that is generated using the bioinformatics program GeneAtlas.
|
|
Artis and coworkers at Genentech designed a cyclic octapeptide (left) to mimic the site (multicolored area on protein) at which the inflammation-related protein VCAM (vascular cell adhesion molecule, right) binds to its ligand. [Reprinted from "Transfer of a protein binding epitope to a minimal designed peptide," D. R. Artis et al., Biopolymers, © 1998 John Wiley & Sons Inc.] |
|
|
|
Structural Bioinformatics, San Diego, also has a large database of computationally derived protein structures, which it uses to design potential small-molecule inhibitors of protein-protein interactions. The company uses bioinformatics tools to predict drug shapes from protein surface information. The resulting pharmacophores (sets of structural features responsible for biological activity) are converted to 3-D database queries to pick small molecules from virtual libraries. Structural Bioinformatics Chairman, CEO, and President Edward T. Maggio tells C&EN that the technology permits active molecules to be identified from the databases with hit rates of about 10%, whereas hit rates of only about 0.01% are obtained with conventional high-throughput screening technologies.
Many other proteomics-related discovery technologies have major bioinformatics components. One of these is CytoWorks, developed by Automated Cell, Pittsburgh. CytoWorks is a system for studying the influence of bioactive proteins and environmental changes on cellular processes. Living cells are imaged using visible or fluorescence microscopy after they are treated with a protein agent or subjected to various environmental influences. The resulting data are then analyzed with bioinformatics software.
"CytoWorks is a platform that is useful in studying protein effects on a number of different biological endpoints in cellular life," said Automated Cell scientist Kris F. Sachsenmeier. "To put it in context, if you've decided on your consensus sequence or your target protein and you've constructed your drug or you're screening a library, you might want to test that protein's [biological] effects by actually exposing it to live, whole cells. That's where we would fit in."
Data from CytoWorks experiments are analyzed using CytoWare, the company's proprietary bioinformatics software system. "Part of our strategy is to build up a knowledge database from this acquired and archived information, which can be used for model prediction and testing," Sachsenmeier said.
Inhibiting protein-protein interactions
A number of researchers at the Proteomics conference described their groups' efforts to identify inhibitors of protein-protein interactions--both as basic research tools and for potential diagnostic and therapeutic applications.
Vice President Roger Brent and coworkers at Molecular Sciences Institute, Berkeley, Calif., have used aptamers (protein-affinity agents) to probe a network of proteins that help control mitosis. In the study, they produced a combinatorial library of short peptide aptamers designed to mimic antibody binding sites. They identified a number of aptamers that bound in a highly specific manner to a cyclin-dependent kinase 2 (Cdk2) target. These aptamers inhibit Cdk2-catalyzed phosphorylation, most likely by binding to the enzyme in such a way that its interactions with the substrate are blocked. The researchers believe such aptamers may be useful for studying protein interactions, for use as sensing reagents, and as leads for inhibitory drug development.
Scientist Dean R. Artis described how he and his colleagues at Genentech designed cyclic octapeptides that mimic the binding site at which the inflammation-related protein VCAM (vascular cell adhesion molecule) interacts with its lymphocyte ligand, the integrin VLA-4 (very late antigen-4) [Biopolymers, 47, 265 (1998)]. Using these peptides, the researchers were able to identify mutants of VCAM with improved binding.
Genentech researchers also recently demonstrated that a peptide selected from a phage display library interferes with another important protein-protein interaction--one in which the proenzyme factor X is processed by the complex of tissue factor and factor VIIA, causing blood to clot [Nature, 404, 465 (2000)]. The peptide may thus be a lead compound for novel anticoagulant drugs.
New way to discover drugs
Indeed, in a fundamental sense, "what we are looking at" in the field of proteomics "is a new systematic way to discover drugs," Hybrigenics' Allemand said at the conference.
Some say that there are already too many potential drug targets available and that proteomics efforts to identify more are superfluous. But Brent, of the Molecular Sciences Institute, disagrees with that notion. "The answer is that there aren't too many targets," he said. "There just aren't enough good targets."
Has proteomics accomplished anything truly useful? For example, are any proteomics-based drugs on the market yet? The answers are "not really" and "no."
However, the field "is only five years old, and it has taken off tremendously recently," said Wettstein of Myriad Genetics, responding to those questions at the conference. "So ask me again in another five years." By that time, Wettstein and many others hope proteomics will prove to have more than justified the considerable efforts that are being devoted to it.
|
|
Web-based maps like these from Hybrigenics (above) and Myriad Genetics (below) help researchers track relationships between sets of interacting proteins. |
|
|
|
[Previous Story] [Next Story]
Top
Chemical & Engineering News
Copyright © 2000 American Chemical Society |