How To Reach C&ENACS Membership Number
Visit SGI


October 6, 2003
Volume 81, Number 40
CENEAR 81 40 pp. 35-36, 38-40
ISSN 0009-2347


Computation fuels protein design, as theory and experiment combine to usher in new era of combinatorial methods and directed evolution


POSSIBILITIES Xencor's PDA technology screens myriad protein sequences for their potential to do a new job.
Since the dawn of biotechnology, scientists have sought to harness the awesomely specialized catalytic power of proteins for their own uses. Despite their numerous successes in engineering proteins for applications in medicine and industry, researchers are aware of many additional opportunities for harnessing catalytic proteins. Designing proteins, though, remains a daunting task, given that we still have much to understand about how these molecules work their magic. Lying just out of reach are a wealth of new drugs and catalysts--and insight into nature itself.

For years, scientists have painstakingly strung together combinations of amino acids in the hope that the resulting molecule would fold correctly and perform its intended task. They also have employed the strategy of mutating existing proteins by changing only a few specific amino acids in the active site.

In recent years, directed evolution has emerged as an important approach to protein design. A field rooted in high-throughput technology, directed evolution involves randomly mutating proteins to create enormous arrays of different structures, then screening and selecting the mutants that best perform a desired "unnatural" task.

BUT TO SEARCH all possible combinations of 20 amino acids in a typical 350-amino-acid protein results in libraries that contain more proteins than there are atoms in the universe--and most of them won't work. If scientists could know ahead of time which amino acids in what positions might be likely to create a successful protein, they'd have a huge advantage heading into the lab.

Computers, then, are an obvious partner in this endeavor. With increasingly powerful modeling techniques, faster speeds, and greater disk space, computation has become an essential element in the search for new proteins. Previously unmanageable libraries can be winnowed down to a few promising possibilities. And predictions of the most promising sites in proteins to alter are allowing scientists to explore new territory in the protein landscape.

This budding and essential relationship between computation and experiment was a focus of a symposium, attended by theorists and experimentalists alike, at the American Chemical Society national meeting in New York City last month.

8140coverstory-guys 8140coverstory-guys
Saven Boder
Computational methods are going to be integral "particularly in the design of sequences and in more combinatorial-type experiments," said University of Pennsylvania assistant chemistry professor Jeffery G. Saven. "There's lots of nice dovetailing between theory and experiment."

Sponsored by the Division of Physical Chemistry, the symposium was organized by Saven and assistant professor of chemical and biomedical engineering Eric T. Boder at the University of Pennsylvania.

A number of talks at the symposium focused on computational strategies for getting the most bang for the buck from protein libraries. For example, a popular technique for generating protein mutant libraries, known as DNA shuffling, involves chopping up DNA sequences and reassembling them randomly. Graduate student Narendra Maheshri described his work with University of California, Berkeley, assistant chemical engineering professor David V. Shaffer on a computational model of DNA shuffling. Dubbed SHUFFIT, the model is intended to optimize shuffling reactions and minimize the formation of "junk" DNA sequences.

Costas D. Maranas, associate chemical engineering professor at Pennsylvania State University, uses various computational methods, including mean-field theory calculations, to identify "clashes" between protein fragments--unfavorable structures that can be easily eliminated from a protein library.

THEORY CAN ALSO be used to zoom in on the crux of a protein's behavior with unprecedented precision. Virginia W. Cornish, assistant chemistry professor at Columbia University, and her colleagues are applying this strategy to help understand the evolution of the bacterial enzyme responsible for penicillin resistance.

The Achilles' heel of penicillin-sensitive bacteria is the penicillin binding protein (PBP), an enzyme that's essential for building cell walls but which is inactivated when it encounters the antibiotic.

Penicillin-resistant bacteria, however, carry an additional enzyme, -lactamase, which instead hydrolyzes the antibiotic, rendering it powerless. -Lactamase likely evolved from an ancient PBP, but what exactly happened to the enzyme has remained a mystery. The proteins are remarkably alike. They have similar three-dimensional structures and conserved active-site residues. Yet their penicillin-hydrolyzing rate constants differ by about six orders of magnitude.

"Our hope is to begin to understand what's responsible for the difference in chemical reactivity," Cornish said.

EVOLUTION A penicillin-binding protein, showing residues where mutations occurred (pink).
Courtesy Of Shalom Goldberg

TO THAT END, Cornish and graduate student Shalom Goldberg are trying to "evolve" a PBP into a -lactamase. They've collaborated with Columbia chemistry professor Richard A. Friesner and his graduate student Benjamin F. Gherman in a computational study of the proteins. They combined quantum mechanics and classical molecular mechanics, treating the bulk of the protein as a classical blob of noncovalent interactions, while saving the more detailed and intensive quantum mechanical calculations for the few amino acids in the protein's active site.

The researchers modeled both the ground and transition state of the hydrolysis reaction for both PBP and the -lactamase. Their calculations pointed to a single tyrosine residue, which is stabilized by a hydrogen-bonding network in the b-lactamase, allowing it to act as a general base catalyst. The residue isn't stabilized in PBP, however. PBP's active site looks like that of the -lactamase, "just slightly more crowded," Cornish said.

Now the team is beginning to use this information to guide mutagenesis experiments. In directed evolution experiments, they've been able to increase the activity of PBP by an order of magnitude.

"I think this is a trend in the field--to marry strengths in computation with the strengths of directed evolution to solve problems we haven't been able to solve yet," Cornish said.

Many computational methods exist for optimizing protein structures, Saven noted at the meeting. However, many of these strategies involve huge numbers of degrees of freedom and are therefore computationally intensive. Rather than perform calculations that solve for specific amino acids, his group has developed a less unwieldy method to obtain the probabilities of amino acids, which they call a "statistical computationally assisted design strategy" (scads). This algorithm calculates the likelihood that certain amino acids will behave well at different positions in a protein based on their interactions with the protein's backbone, other side chains, and the environment.

They've used this method to help design a 114-residue, monomeric, helical di-iron protein. The four-helix bundle with two iron or manganese atoms in the center is a common structural motif in many proteins and has important biological functions, such as oxygen binding and transport.

REDESIGNED A water-soluble version of the potassium-channel protein KcsA contains 29 computationally designed exterior amino acids for each subunit.
RECENTLY, University of Pennsylvania biophysics professor William F. DeGrado designed simple oligomeric versions of these bundles. Saven, DeGrado, and their colleagues, including graduate student Jennifer Calhoun, then took that motif even further, using computational methods to design a more "natural" protein facsimile. They first designed a sequence backbone and added a couple of dozen residues important for function and metal binding. They then used scads to solve for the remaining 88 amino acids.

"The idea was to develop an analog that was more akin to what we see in nature--and to make something more soluble and easily expressed," Saven said. "Now we have something robust that should tolerate lots of mutations."

The University of Pennsylvania groups and their colleagues, including Hidetoshi Kono at the Japan Atomic Energy Research Institute, Kyoto, also used scads to transform KcsA, a membrane-bound bacterial potassium-channel protein, into one that's water soluble. Their strategy was to make the protein's lipid-contacting side chains more polar while maintaining its structure and function. Though membrane proteins comprise a large fraction of drug targets, they are notoriously difficult to study experimentally. The group's computational methods may therefore make it possible to study these proteins' structure and biophysical properties in unprecedented detail.

Computational direction is a foundation of protein engineering methods developed by Monrovia, Calif.-based Xencor. John R. Desjarlais, Xencor's director of computational biology, explained at the meeting that their Protein Design Automation (PDA) technology couples computer-aided design with experimental high-throughput methods. With a specific job in mind for an engineered protein, they identify sites in a natural protein likely to be involved in the action they're seeking. They computationally scan different combinations of amino acids at those positions and select those sequences predicted to produce proteins with the structure, stability, and function they want. They then create these proteins in the lab using combinatorial mutagenesis methods.

In one example that Desjarlais presented at the meeting, Xencor created a variant of thioredoxin reductase, an enzyme important in cellular metabolism. It requires the biological cofactor NADPH to perform. A similar cofactor, NADH, is less expensive and more stable, so the group altered the enzyme to use NADH in order to make a food-processing system more cost efficient, Desjarlais said. "It worked very well," he said. "We've been able to discover numerous novel protein sequences with a diverse range of cofactor specificities."

INTERACTION Dark blue squares show strong interactions between bZIP proteins in this 2-D array.
APPROACHING THE PROBLEM from the other direction, some researchers are using high-throughput assays to help direct computational studies. Amy E. Keating, assistant chemistry professor at Massachusetts Institute of Technology, studies the interactions of common proteins used in DNA transcription, known as bZIP transcription factors. These proteins have characteristic regions of coiled coils, which bind together. Scientists want to understand how the amino acid sequences in these coiled coil regions determine what combinations of bZIP proteins will dimerize.

Keating's group created a two-dimensional array of all possible combinations of about 50 coiled coil regions from different bZIP proteins. They used fluorescent markers to determine how strongly different combinations dimerized. Strikingly, only a few combinations bound well [Science, 300, 2097 (2003)].

With this information, Keating and Princeton University assistant computer science professor Mona Singh are now testing a machine-learning method for predicting interactions for these proteins. Keating said they've used the experimental bZIP data to train the prediction method, improving its abilities considerably. They also plan to use the experimental data to improve atomic-level models for coiled coil interactions. "These models will, in turn, improve our ability to do prediction and will also be useful for protein design calculations," Keating said.

It's not yet entirely clear what characteristics make for good binding, she noted. Factors such as electrostatic charge complementarity, pairing of buried asparagine residues, and good hydrophobic packing at the helix-helix interface are known to be important. But Keating and Singh's computations suggest they're not the whole story. Rather, when the team considered interfacial residue-residue interactions (which have not traditionally been considered important), they were found to improve the performance of machine-learning algorithms.

This new close mingling of experiments and calculations, Keating added, is a promising approach for making progress in understanding--and ultimately predicting and rationally modifying--the factors that determine the specificity of protein interactions.

INCOMPLETE Interactions such as core packing(left), charge complementarity at repeating residue positions 'e' and 'g' (center) and core polar residues are important in dimerization, but other factors may be involved.


Chemical & Engineering News
Copyright © 2003 American Chemical Society

Visit SGI
Visit Eastman
Visit Fluorous Technologies Inc.
Visit ChemSW
Related Stories
Computational Nanotechnology
[C&EN, Apr. 28, 2003]

Creating Custom Proteins
[C&EN, May 12, 2003]

Structures By Computation
[C&EN, Oct. 14, 2002]

[C&EN, Nov. 11, 2002]

[C&EN, Oct. 8, 2001]

E-mail this article to a friend
Print this article
E-mail the editor

Home | Table of Contents | Today's Headlines | Business | Government & Policy | Science & Technology |
About C&EN | How To Reach Us | How to Advertise | Editorial Calendar | Email Webmaster

Chemical & Engineering News
Copyright © 2003 American Chemical Society. All rights reserved.
• (202) 872-4600 • (800) 227-5558

CASChemPortChemCenterPubs Page