Nov/Dec 2000, Volume 3, No.9, pp. 4142, 45-46.
The overall format of CASP has remained consistent since 1994. Between 30 and 45 amino acid sequences for soon-to-be-solved experimental structures, generously donated by NMR and X-ray labs, are listed, and researchers are invited to submit theoretical models of each proteins structure. Independent assessors then judge the predictions by comparing them with the solved structure. The results are tallied and subsequently announced at the CASP meeting.
Although the format has not changed significantly since its inception, the number of participants and predictions certainly has, rising almost exponentially over the yearsfrom 136 models at CASP1 to more than 11,000 at CASP4. Another change is in scoring. Some participants believed that the scoring techniques used in CASP1 heavily favored predicted folds that, although possible, were nonetheless wrong. Beginning with CASP2, scores were based more heavily on quantifiable measurements, such as how well each residue fits within the entire model and local folds, and how accurately the predicted structure contacts neighboring residues. Additional changes were implemented in CASP3, and organizers thought that these methods best determined the accuracy and fit of a predicted model to the structure.
Blasted alignments. For many researchers, the first step in identifying a protein is comparing its sequence with other sequences in the databases. Similarly, determining a proteins structure from its sequence requires comparison with other sequencestructure relationships. Much of the problem stems from deciding the cutoff points of sequence identity and similarity, and how many sequences to compare.
To improve the accuracy of multiple alignments, Kevin Karplus and colleagues at the University of CaliforniaSanta Cruz developed SAM-T98, which used hidden Markov models to assign probabilities of amino acid position and identity in database searches (4). Each residue position in the alignment model has three statesmatching, insertion, and deletionand neighboring positions are connected by an associated transition probability, the chance of going from one state to the other. Thus, one can ask whether a given residue will score higher at the end of one sequence or the beginning of the next.
Secondary considerations. When sequence identities fall below 25%, the jump from primary sequence to tertiary structure is difficult. Instead, attention focuses on the secondary structural elements that make up the protein of interest, in the belief that structure is conserved even when sequence isnt. Since the 1960s, groups have developed simple secondary structure prediction methods that were based on the statistical probability of a given residue in a given context forming part of a specific fold (5). Although these methods were amazingly accurate, they looked solely at the sequence in question. Recently, however, expanding sequence databases have enhanced prediction accuracy through the alignment of multiple sequences.
David Jones and colleagues at Brunel University in London developed PSIPRED, a prediction method that incorporates two neural networks to analyze PSI-BLAST results (6). The same group also developed GenTHREADER, a fold recognition program that, when applied to the Mycoplasma genitalium genome, predicted that >50% of coding regions showed a significant relationship to known structures. Unlike most threading methods, however, GenTHREADER uses potential evolutionary relationships to filter out false positive predictions (7).
Tertiary period. But, of course, the name of the game is three-dimensional structure prediction. Although a drug may be designed against a small portion of a protein, it is difficult to model the drugs effectiveness without a firm understanding of the whole target.
Andrej Šali and colleagues at Rockefeller University in New York created MODELLER, a program that generates three-dimensional structures using spatial restraints (8). Initial restraints include information from homologous structures, such that if the aligned protein has a hydrogen bond between two specific residues, then the sequence being modeled can be expected to have a similar bond between the equivalent residues. Once the initial model has been built, it is tested with other restraints, including experimental data and basic chemical bonding rules.
Threading, alluded to earlier, is a mechanism to address
From scratch. As we have seen, many model builders use structural and biochemical information to enhance and test their models, but there is a subset of researchers who are adamant that all of the information required for proper folding is contained in the amino acid sequence. They argue that newly synthesized polypeptides are unlikely to require a sibling protein before knowing how to fold correctly. This intrepid group of researchers is looking at ab initio folding.
Michael Levitt and his group at Stanford University School of Medicine (Stanford, CA) use a two-step process. Initially, they generate a large number of potential folds using various algorithms, in the hope that some of them will approximate the native fold. They then discriminate among the folds chosen most often by looking for problems in parameters such as distance geometry and hydrophobic packing (10).
Harold Scheraga and his group at Cornell University in Ithaca, NY, are more daring. They are determined to derive protein structures strictly from energy functions. To simplify the process, they first represent the individual residues as single interaction sites, which simplifies the energy calculations. Upon creating an initial conformation, they return the residues atoms and look for a distinct backbone fold using their conformational space annealing method (11). This method allowed the group to locate the native folds of several proteins within the subsets of low-energy conformations.
The finish line
Blasted alignments NCBI; BLAST/Gapped-/PSI-/PHI-; Kevin Karplus, developer; SAM-98; Secondary considerations David Jones, developer; PSIPRED; David Baker, developer; Rosetta; Tertiary period David Jones, developer; Threader; Andre Sali, developer; MODELLER; Stephen Bryant, developer; From scratch Michael Levitt, developer; Harold Scheraga, developer;
Blasted alignments NCBI; BLAST/Gapped-/PSI-/PHI-;
Kevin Karplus, developer; SAM-98;
Secondary considerations David Jones, developer; PSIPRED;
David Baker, developer; Rosetta;
Tertiary period David Jones, developer; Threader;
Andre Sali, developer; MODELLER;
Stephen Bryant, developer;
From scratch Michael Levitt, developer;
Harold Scheraga, developer;
Stephen Bryant, who did not participate in CASP4, expressed his concern that too much time was being spent on method development and not enough on practical applications. Not only are [the participants] duplicating each others efforts with respect to trying to predict the structures of these targets, but they are also toy problems because these are proteins where the real structure is about to be known, and so anything you predict is not going to be of any practical use, he offers. So its useful only for methodology testing; its not useful in terms of advancing scientific knowledge.
Kevin Karplus disagrees. Although admitting that participation takes a lot of time and effort, he believes that the conference is tremendously valuable and adds a level of reality to the research that is not found in the literature. Its nice to have somebody outside saying Here is a structure for blind prediction. If youre doing prediction, participate in this or were not going to listen to you, Karplus says. Theres been a large exchange of ideas. People look at methods that might not otherwise have been considered because they were outside the particular methods they were using.
How close are these computational groups to producing an algorithm that will accurately and consistently predict protein structures from DNA sequences located in databases throughout the world? Dont expect this method to replace experimental structure techniques just yet. However, the CASP organizers believe that the current prediction techniques can aid in the design of experiments and also provide some level of insight into a proteins molecular function.