|[Previous Story] [Next Story]
FROM SEQUENCE TO CONSEQUENCE
Structural genomics efforts help researchers determine biological functions of proteins
STU BORMAN, C&EN WASHINGTON
Going from sequence to consequence "is of course what proteomics is all about," according to chemistry and biochemistry professor Gregory A. Petsko, director of the Rosenstiel Basic Medical Sciences Research Center at Brandeis University. "It's about going from sequence to the ultimate consequence--the actual biological function of a protein, or a drug to inhibit that biological function." The phrase "sequence to consequence" was first coined by Petsko's colleague at Brandeis, professor of chemistry and biochemistry Dagmar Ringe.
"What can structural biology do to help take us further along the path to consequence?" asked Petsko rhetorically. He spoke last month at the American Chemical Society ProSpectives conference on "Defining the Proteomics Agenda," in Leesburg, Va.
|STRUCTURE TO FUNCTION? The bagel-like configuration of the enzyme triosephosphate isomerase, called a TIM barrel, is found in about one of every 10 enzymes, most of which have completely different functions.
"One of the hardest things human beings have ever tried to do is to make a drug," Petsko said. "A huge number of chemical compounds have to be synthesized, in which only a tiny fraction make it all the way to Phase III clinical trials. And even those fail two-thirds of the time. The cost is astronomical, the time is ridiculous, and both are going up, not down, despite advances in technology.
"There just aren't enough drugs," he added, "and we've only just begun to plumb the depths of chemistry for compounds that might be useful pharmacologically. To give you an idea, the best-selling anticancer drug in the world is a molecule that doesn't have a single atom of carbon in it. I'm referring to cisplatin. That's a purely inorganic substance that would never have been detected in any high-throughput screen by any pharmaceutical company on the planet. And if that doesn't discourage you, nothing will."
We also don't have enough targets, he said. Known drugs hit only about 400 human protein targets, which represent less than 1% of the human genome. "Receptors, nuclear receptors, and metabolic enzymes constitute overwhelmingly the majority of targets for drugs," he said, "but there are huge classes of drug targets that have not been plumbed at all."
With the human genome sequence now largely in hand, scientists are frequently going to be faced with sequences that produce proteins known to be important in disease but with unknown biological functions. Absent such information, Petsko said, it's pretty hard to design drugs to block those functions.
"Maybe, at least in part, knowing the structures of lots of gene products will help us correlate structure with function, so in the absence of any other information we can pull out the function of the protein by looking at its three-dimensional structure," he said. "It is that argument that is behind all the structural genomics initiatives that are going on around the world at the moment."
BUT HOW PLAUSIBLE is that argument? It's certainly true that it's getting easier to determine protein structures. "When I did my Ph.D. thesis--more years ago than I would like to admit in polite company--I was working on the structure of the fifth protein that had ever been solved," Petsko recalled. In contrast, "the current rate of protein structure production worldwide is five a day. And one reason is that nearly all the steps of this process have been automated, or are coming close to being automated, for simple structures."
There are still "large, complicated structures that can't be solved in a high-throughput fashion, and that will keep people like myself happily busy for some years to come, I hope," he said. "But the routine process of structure determination is becoming almost automatic."
One consequence of automation has been a huge but still linear increase in determined protein structures over the past few years. "The hope of structural genomics is to change that from a linear increase to an exponential one," Petsko said.
But there are roadblocks along the way to that goal. Structural genomics researchers clearly can't determine the structure of every gene product from every organism, so they're probably going to have to get most of their protein structures by extrapolation from models of similar proteins. "That works up to a point," Petskso noted, "but it doesn't work well for drug design" because extrapolation is simply not accurate enough. And in any given genome, a quarter or more of the proteins are membrane associated, but membrane proteins are currently difficult or impossible to analyze structurally.
Even for soluble proteins, knowledge of structure does not automatically imply knowledge of function. "The best example I know is the three-dimensional structure of the enzyme triosephosphate isomerase [TIM], a glycolytic enzyme," which resembles a bagel and has its active site in the center, Petsko said. "About one of every 10 enzymes, and maybe even total proteins, in many organisms has a fold that looks like this," Petsko said. "Since they can't all have the same function, that's a problem."
Different TIM barrel enzymes are likely to catalyze "completely different reactions on completely different substrates with completely different chemical mechanisms--and there is no such thing as a common inhibitor for them." So if you determine a new protein structure and it's a TIM barrel, "there's almost a zero chance that you know what it does."
Knowledge of function also does not imply knowledge of structure. "You cannot go backward," he said. "You have to be very careful about ascribing a function to a structure because nature is lazy and uses the same fold for different things." For example, l-aspartate aminotransferaseand d-amino acid aminotransferase have no sequence identity and different polypeptide chain folds, yet they both use pyridoxal phosphate as a cofactor and both catalyze the same reaction (transamination of amino acids).
Furthermore, a range of diverse functions can be found in a single protein. "This is one of the hidden reasons why the genomes of higher organisms are as small as they are," Petsko said. "You can have a small genome with a very large 'functionome' because multiple functions are grafted onto the same polypeptide chain."
The consequences of this for molecular biology, drug discovery, and gene therapy are profound: Knock out a protein by antibody ablation or antisense RNA and you destroy all of its functions, not just the one you might have wanted to abolish--which is obviously not the best strategy for finding drugs with high functional selectivity and minimal side effects.
Determining a protein's structure should be good for drug design because it makes it easier to identify molecules that fit into the protein's functional site or sites. "But before we can do that, we need to interrogate for the location of those sites," Petsko said. "We need to have a way of taking structures and probing them computationally or experimentally to say these are the sites that matter and these are the things that like to bind there. If we can't do that, and in a fairly high-throughput fashion, we're dead before we even start. And given the growth of antibiotic-resistant microorganisms, we may be dead in fact, as well as in metaphor."
A FEW YEARS AGO, Ringe's group came up with a strategy for determining which sites on the surface of a protein are used to recognize small-molecule substrates or biomolecular ligands. "Protein crystals have very large solvent-filled channels into which you can diffuse lots of things," Petsko explained. Ringe's technique, called solvent mapping, involves diffusing into those channels organic solvents that resemble the functional groups of drugs and other ligands and then determining if and where they bind [Nature Biotechnol., 14, 595 (1996)]. For example, "if you're interested in where peptides bind, all you've got to do is soak your crystal in dimethylformamide and it will tell you. If you want to know where aromatic rings bind, soak in benzene."
But solvent mapping "requires that you have a real protein crystal to do it, and it requires real experiments," Petsko said. "Ultimately, what we'd like to be able to do is convert this into a computational tool for locating protein active sites and binding sites. If you want to design a drug, you would then connect the dots, like you did when you were a kid and drew pictures." But in this case you would connect ligands using linkers. "We've done this for elastase and it worked," he said. "So potentially there's a hope here that structural information on a large scale could be used for drug design." Locus Discovery, Blue Bell, Pa., has independently developed a high-throughput computational technique that calculates the free energy of binding of organic fragments to a protein, uses that information to identify binding sites on the protein, and chemically connects the fragments into unique small molecules.
However, solvent mapping doesn't identify "which residues on the surface of the protein actually do chemistry," Petsko said, "and if you ever want to have a hope of figuring out the function of new proteins, you have to not only locate the active site but you have to figure out which residues in the active site are actually carrying out the function. Only then do you have a hope of searching a database of motifs or whatnot to try to guess what that function might be."
THEMATICS, a computational tool for identifying the residues in a protein active site that actually carry out chemistry, has just been developed by chemistry professor Mary Jo Ondrechen of Northeastern University and Ringe and postdoc James G. Clifton at Brandeis. "The idea is to carry out calculations on the protein structure and look for residues whose behavior as a function of pH in the computer is abnormal," Petsko said. Using this computational technique, perhaps in conjunction with a computational version of solvent mapping, it would be possible to analyze structural information and get some idea of biochemical function in a high-throughput manner, he proposed.
But the path from gene sequence to functional consequence will also require "an understanding that function is context-dependent, that it means different things to different people, and that the biochemical function of a protein may be only a small part of the myriad of things that it does in the context of a living cell," Petsko said. "If we're going to go from sequence to consequence, from genome to proteome to an understanding of function and to the development of drugs, no one technique suffices. We had better all become literate in a large number of different methods. There is no other path from sequence to consequence that works."
[Previous Story] [Next Story]
Techniques Use Theoretical Means To Pinpoint Functionally Important Sites In Proteins
A new computational technique called THEMATICS predicts the active sites of enzymes and identifies the specific amino acid residues that play leading roles in enzymatic catalysis. It's one of several theoretical techniques that have been developed recently to pinpoint functionally significant sites in proteins.
THEMATICS--which stands for "theoretical microscopic titration curves"--is an a priori method for pinpointing active sites in enzymes whose structures are known but for which other biochemical and activity data are lacking. It has potential as a guide to drug design and as a means to help determine the function of newly discovered proteins--a key goal of proteomics, the high-throughput study of protein expression and function.
The technique was developed by chemistry professor Mary Jo Ondrechen of Northeastern University and postdoc James G. Clifton and professor of chemistry and biochemistry Dagmar Ringe of Brandeis University [Proc. Natl. Acad. Sci. USA, 98,12473 (2001)].
The researchers analyzed the theoretical titration curves of all the ionizable amino acid residues in three enzymes and found that "a small fraction (3 to 7%) of the curves possess a flat region where the residue is partially protonated over a wide pH range," they explain. "The preponderance of residues with such perturbed curves occurs in the active site." They then showed that pH curve analysis could be used successfully to identify known active sites in a number of other enzymes of varied activity and structure.
The computational determination of microscopic pH curves per se is not new. Several groups have used it to analyze the ionization behavior of residues known to be important for catalysis--including molecular biology professor Donald Bashford at Scripps Research Institute; professor of biochemistry and molecular biophysics and Howard Hughes Medical Institute (HHMI) investigator Barry Honig at Columbia University; Paul Beroza in physics professor George Feher's group at the University of California, San Diego; and UCSD professor of theoretical chemistry and HHMI investigator J. Andrew McCammon. John E. Straub, associate professor of theoretical and computational chemistry and biophysics at Boston University, tells C&EN that THEMATICS thus "builds on fundamental developments made by a number of researchers in the field of computational biology."
What is novel about THEMATICS is the realization by Ondrechen and coworkers that abnormal pH behavior is a computational hallmark of active sites and that it can therefore be used to identify active-site residues in an a priori manner. It's one of several similar computational approaches.
For example, the team of biomolecular structure professor Janet M. Thornton at University College, London, is one of a number of groups that have used geometric methods to search for cavities in structures as a means of identifying potential active sites [Protein Sci., 5, 2438 (1996)]. Associate professor of biomolecular engineering Minoru Sakurai and coworkers at Tokyo Institute of Technology, Yokohama, Japan, used electronic structure calculations to find that frontier molecular orbitals of hydrated proteins tend to be localized on functionally important residues [J. Am. Chem. Soc.,123, 8161 (2001)].
Assistant professor of biochemistry Adrian H. Elcock of the University of Iowa, Iowa City, also recently reported a structure-based technique for locating active site residues and other functionally important residues [J. Mol. Biol., 312, 885 (601)]. "I used continuum electrostatic calculations very similar to those used by Ondrechen, et al., but with a view to calculating electrostatic contributions to stability rather than pH profiles," Elcock tells C&EN. The basis of the technique "is that functionally important residues destabilize proteins and that the computational identification of destabilizing residues can therefore serve as a diagnostic for identifying functionally important residues." Using the technique, he says, "I reported detailed calculations for six proteins, summarized results for 216 other proteins, and made predictions for three recently solved protein structures."
THEMATICS has not yet been used to analyze that many proteins, but Ondrechen and coworkers point out that it is simple, computationally fast, not database dependent, and amenable to automation for high-throughput computational screening of enzymes. By identifying active sites, THEMATICS and related strategies "supply valuable information in the exciting quest to understand protein function and to translate genomic information into a useful form," they note.
STANDOUTS In ribbon structure of the enzyme triosephosphate isomerase (TIM), active-site residues histidine-95 and glutamate-165 are highlighted in pink. Theoretical titration curves of TIM histidines (top) and glutamates (bottom) show that those same two residues have perturbed theoretical titration curves (pink).
Chemical & Engineering News
Copyright © 2001 American Chemical Society