About MDD - Subscription Info
November 2001
Vol. 4, No. 11, pp 59–60, 62.
sites and software
NMR spectroscopy software
Molecular biologists can now use proton NMR data to elucidate protein structure.

opening art
PROTEIN DATA BANK
In the last decade, molecular biology has developed methods for rapid, automated production of polypeptide gene expression products. Most of these products are generated at a rate that makes traditional structure determination methods, such as X-ray crystallography, less than optimal. The demand for rapid, automated structure methods has brought forward sophisticated new software for analyzing large molecules by spectroscopic techniques. Twenty years ago, few molecular biologists used NMR spectra to examine gene expression protein products; now using such spectra is routine.

It is straightforward molecular logic to use forms of spectroscopy that have been successful in chemistry to investigate ever-larger molecules in biochemistry. And the logic of information processing dictates that, for molecules with molecular weights above a few thousand, fairly complex analytical software is necessary for assigning most proton resonance peaks in an NMR spectrum.

2-D NMR
Most chemists can probably draw an approximate proton NMR spectrum of a small organic molecule by hand and easily interpret spectra from molecules up to molecular weights of a few hundred. For decades, proton NMR has been a critical tool for analyzing structures in synthetic organic chemistry. Lately, much attention has been devoted to the use of various 2-D NMR methods to interpret the 3-D structures of much larger molecules in solution. The 2-D NOESY (nuclear Overhauser effect spectroscopy) technique in particular is simple enough to be widely useful.

In this 2-D method, using the proton–proton nuclear Overhauser effect, a cross-peak off the diagonal identifies two protons within the approximate distance limits of 2.0–4.5 Å from each other, and the proton chemical shifts are used to help assign the proton peaks to particular hydrogen atoms in the chemical structure.

The challenge lies in assigning each observed cross-peak in the 2-D NOESY spectrum to a particular pair of protons and then deducing the structural or conformation properties of the molecule.

Protein NMR
Building a complete 3-D structure of a protein from 2-D NOESY information typically requires generating a 3-D model from amino acid sequence data and rules about bond lengths and angles. Heuristic guidelines are used to guess details of the protein’s secondary structure. This model is used to predict a 2-D spectrum, which is then compared with the experimental spectrum; the structural model is adjusted through several iterations of predict-and-compare with the NOESY results. There are numerous NOESY peaks that are uninteresting because they arise from the covalent structure, such as the peaks from adjacent hydrogens on the aromatic ring of phenylalanine. These peaks provide no information about conformation of the polypeptide chain, so a first step in analyzing a protein’s NOESY spectrum is eliminating these uninteresting peaks in order to focus on the remaining peaks.

The main NMR instrument vendors, Bruker (www.bruker.com/nmr/), Varian (www.varianinc.com/nmr/), and JEOL (www.jeol.com/nmr/nmr.html), all provide software for analyzing NOESY spectra, but it is primarily oriented toward use by organic chemists. Software for analyzing large-molecule NOESY spectra is a more specialized effort by individual research groups worldwide. Because most of these groups make their software available over the Web, a brief round-the-world tour characterizes the state of the art in the rapidly growing area of analytical NMR spectroscopy of proteins.

AQUA. This is a suite of programs for Analyzing the QUAlity of biomolecular structures that were determined by NMR spectroscopy. AQUA was first developed in the NMR spectroscopy department of the Bijvoet Center for Biomolecular Research, Utrecht, The Netherlands, and is currently maintained and expanded at the BioMagResBank, University of Wisconsin–Madison (www.bmrb.wisc.edu/~jurgen/aqua/).

“AQUA (starting with version 3.0) calculates the level of completeness of an experimental set of NOEs on the basis of a 3-D structure of the molecule.” The easiest way to try AQUA’s completeness module is to use one of the AQUA servers found on the Web page above. The Web-based calculation service can handle NOE restraints from most NMR software data-acquisition packages.

AQUA was developed as part of the Biotech Validation Project, a collaborative effort of six European laboratories. The project aimed to produce a coordinated and linked set of software modules that integrate several existing and new procedures and protocols for recording, communicating, and validating the models resulting from 3-D structural studies on biomolecules.

Graphical structural information from AQUA is produced by reading AQUA output files into PROCHECK-NMR, a program in the PROCHECK suite for assessing the stereochemical quality of protein structures. This suite is available from www.biochem.ucl.ac.uk/~roman/procheck/procheck.html.

Jigsaw (www.cs.purdue.edu/homes/cbk/jigsaw.html) applies graph algorithms and probabilistic reasoning techniques, enforcing first-principles consistency rules in order to overcome the poor signal-to-noise ratio (~10% or less) that is typical of protein NOESY experiments. Jigsaw uses only four experiments, on unlabeled protein, thus dramatically reducing both the amount and expense of wet lab molecular biology and the total spectrometer time. Results for three test proteins demonstrate that Jigsaw correctly identifies 79–100% of alpha-helical and 46–65% of beta-sheet NOE connectivities and correctly aligns 33–100% of secondary structure elements. This Jigsaw approach yields quick and reasonably accurate (as opposed to the traditional slow and extremely accurate) structure calculations and should be useful for quick structural assays.

A key idea of Jigsaw is that regular protein secondary structure yields stereotypical through-space atom interactions, which are visible in a NOESY spectrum. Jigsaw can find such patterns in a spectrum even if the positions in the primary sequence (assignments) are unknown. Jigsaw encodes NOESY data in a graph with nodes representing unassigned amino acid residues and edges representing possible interactions observed in the NOESY spectrum. This graph is noisy because many residues have approximately the same chemical shift for an interacting proton. But buried within this graph is a set of edges that look like the alpha-helix and beta-sheet interactions, defining much of the protein’s structure. Jigsaw relies on the fact that large groups of incorrect edges are unlikely to conspire to form alpha-beta patterns. Jigsaw imposes a set of constraints derived from the patterns in order to focus a graph search, working a “jigsaw puzzle” to find the correct secondary structure. Then Jigsaw goes through several more refinement steps, typically involving other 2-D NMR methods, to help chemical shift assignments and employ spin–spin coupling information.

Center for Advanced Biotechnology and Medicine. This center at Rutgers University sponsors a protein NMR laboratory, which offers a suite of protein NMR analytical software (http://www-nmr.cabm.rutgers.edu/NMRsoftware/nmr_software.html) for downloading. This suite includes GenCons and AutoAssign as starting points in a structural study. GenCons can read a series of files with assignment lists and intensities of NOE peaks. With this information, it translates intensities into proton–proton distances and outputs a constraint file in one of the popular file formats (DIANA2.8, CONGEN, or X-PLOR) for structure determination software.

AutoAssign is a constraint-based expert system for automating the analysis of backbone resonance assignments using a variety (13C, 15N, and proton) of NMR spectra of small proteins. The C++/Java-based AutoAssign (available for use on SGI server hardware) automates the assignments of HN, NH, CO, C-alpha, C-beta, and H-alpha resonances from a set of peak-picked triple-resonance NMR spectra. Test data provided with the program include several independently collected triple-resonance NMR peak lists for proteins ranging in size from about 6 to 18 kD. With this experimental data set, AutoAssign obtains nearly complete resonance assignments (~98%) with virtually no errors (<0.5%). The constraint-based algorithm limits assignments to only those peaks for which significant confidence is possible. AutoAssign automatically analyzes backbone resonance assignments in only seconds on current RISC and Pentium-based platforms.

MORASS (Multiple Overhauser Relax ation AnalysiS and Simulation). Developed at the NMR Center of the University of Texas Medical Branch, it uses a full hybrid matrix eigenvalue/eigenvector solution to the Bloch equations to derive proton–proton cross-relaxation rates and thus interproton distances from NOESY data. MORASS analyzes 2-D NMR NOESY data from oligonucleotides and proteins and delivers the interproton distances in a format suitable for use as distance constraints in molecular dynamics calculations. MORASS 2.41 is the most current version available. A 3-D version is in development and will be posted at www.nmr.utmb.edu/#mrass.

MORASS doesn’t provide software for visualization of results but relies on the biomolecular graphics program GRASP, available by FTP at www.nmr.utmb.edu/grasp/graspinfo.html. As a structural program, GRASP is especially well suited for examining surface phenomena and electrostatic potentials. A version of GRASP modified to accept MORASS input (MORASS NOESY constraint differences) can be obtained from Anthony Nicholls at nicholls@cuhhca.hhmi.columbia.edu.

LinuxNMR. One of the problems with NMR structural analysis of proteins is that it works best with very-high-field (500–800 MHz) NMR instrumentation for data acquisition, and such systems are expensive and not widely available. The LinuxNMR (linuxnmr.org/development.html) in the biochemistry department of the University of Wisconsin–Madison lets researchers acquire time on the latest instrumentations and also provides a low-cost software solution for protein chemists. The workstation hardware needed for much of the NMR/NOESY software described above may cost more than $10,000 per workstation, which can be a prohibitive expense for some laboratories.

The LinuxNMR goes through the basic steps of proton resonance peak-picking, resonance assignment, NOE restraint generation, and structure calculation, with the last two steps generally performed in an iterative manner. This means that the result of one round of structure calculations is used to help correct misassigned NOE cross-peaks and identify new NOE restraints for the next round of structure calculations. The LinuxNMR project has successfully executed all of these stages in determining a new protein structure using noncommercial software on consumer-level laptop and desktop computers, typically Pentium-class hardware rather than workstations.

Conclusions
Of course, some commercial NMR software packages are devoted to analyzing the same class of problems: The professional version of Acorn NMR’s software package (www.acornnmr.com) and FELIX-Assign from Accelrys (formerly Molecular Simulations, www.msi.com) also provide automatic and semiautomatic capabilities for NMR/NOESY spectral assignment of biological macromolecules. Nevertheless, it is a remarkable development in modern biochemistry that research groups interested in rapid structural analysis of proteins and other biomolecules by NMR are willing to share the software they have developed with groups around the world, effectively unifying the direction of research on protein structure.


Charles Seiter is a former chemistry professor and has designed a variety of analytical instrument software. He has written 20 books on computing and contributes regularly to PC World and Macworld. Send your comments or questions regarding this article to mdd@acs.org or the Editorial Office by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

Return to Top || Table of Contents

 CASChemPortchemistry.orgPubs Page