Human Genome Project Information Archive

Archive Site Provided for Historical Purposes

Sponsored by the U.S. Department of Energy Human Genome Program

Human Genome News Archive Edition
go to list of issues »
Vol.12, Nos.1-2   February 2002

PROSPECT for Protein Structure Predictions

Wins 2001 R&D 100 Award

Explorations into the 3-D structures of proteins hold the key to understanding their biological functions and thus their roles in a living system. Proteins fold into complex shapes, creating active areas that enable them to interact with other proteins to accomplish a complex biological function in much the same way that gears in a watch mesh into a functioning machine. A broad collection of protein structural data will have an abundance of applications in the life sciences, biotechnology, and medicine. [This goal is the focus of an international structural genomics effort reported in (HGN).]

Revealing these structures, however, is not easily accomplished (see Predicting 3-D Protein Structure). Typically, a protein’s 3-D structure is determined through such experimental methods as X-ray crystallography or nuclear magnetic resonance (NMR). The whole process, including protein expression and sample preparation, data collection, and structure-model construction, may take months or even years. This pace clearly cannot keep up with the rate at which protein-encoding genes are being identified worldwide. Nor can it satisfy increasing demands by drug companies hoping to use these data to custom-design drugs that fit precisely in the proteins like hands in gloves, blocking or enhancing their activities and minimizing side effects.

Predicting Structures with PROSPECT
In response to this need, Ying Xu and Dong Xu (Oak Ridge National Laboratory, ORNL) have developed PROSPECT (PROtein Structure Prediction and Evaluation Computer Toolkit), a threading-based protein structure-prediction program. PROSPECT uses an algorithm (see Predicting 3-D Protein Structure) that mathematically guarantees finding the globally optimal sequence-structure placement and doing so in a computationally efficient manner—a feature unique to PROSPECT. This algorithm was achieved through the discovery that 3-D protein structures generally have topologically simple arrangements between key components (alpha-helices and beta-strands). The prediction capability made PROSPECT one of the top six performers in the threading category in the last CASP contest (see CASP Competition for Protein Structure Prediction).

Another of PROSPECT’s unique capabilities allows users to enter any known structural data as constraints on the prediction. That structural information could be disulfide bonds between certain cysteines, geometrical relationships among residues identified as involved in the active site, and experimentally verified or predicted secondary structuresjust to name a few. This use of additional structural data as prediction constraints has greatly increased PROSPECT’s accuracy.

By further extending the data-constrained prediction paradigm, ORNL researchers have developed a hybrid technique for protein-structure determination, using PROSPECT and large-scale experimental data from NMR or mass spectrometry (MS) in conjunction with chemical cross-linkers. The basic idea is to systematically obtain a large number of distances across amino acid residues and use them as constraints to threading and detailed atomic structure modeling through energy minimization. The investigators have demonstrated that structural information from fold recognition by threading is complementary to that from NMR or MS. Effectively combining these multiple sources of information makes it possible to solve protein structures or structure complexes that cannot be identified by existing methods. This series of developments led R&D Magazine to designate PROSPECT as winner of a 2001 R&D 100 award, presented for the year’s most significant technological innovations (see R&D Awards).

The hybrid technique could have significant implications for structural genomics projects, where the goal is to solve protein structures on a genome scale through the development and application of new and improved technologies. NMR methods generally work well for small proteins, but their effectiveness drops quickly as protein weight increases beyond 30 kD. The problem is in assigning enough NMR spectral peaks for an accurate structure determination of a large protein. Typically, when this problem is solved, valuable information can be retrieved in identifying the correct structural folds and providing accurate backbone and even detailed side-chain conformation predictions, as the ORNL researchers have demonstrated. As it matures, this capability should allow at least a good approximation of a proteins actual structure for which existing NMR methods may not work well, due either to the proteins size or its structural stability under NMR experimental conditions.

PROSPECT is the second biological analysis system from ORNL to receive an R&D 100 award. The first was GRAIL, an online automated gene-finding tool that won in 1992. Detailed information about PROSPECT and related projects can be found at http://compbio.ornl.gov/structure/.

Predicting 3-D Protein Structure
In theory, a protein’s 3-D structure could be solved computationally. Using the first principle of physics, investigators could determine the interacting potential energy’s exact formula among the atoms of a protein's amino acids with their environment in solution. Many years of research into protein structures have revealed that a protein folds its peptide chain into a “unique” 3-D conformation that minimizes potential energy. So scientists could search for its folded conformation with the minimum energy state. However, this search method, called ab initio folding, is impractical because it requires many times more computing power than the current industry can offer.

Protein threading is a computational method for predicting a protein’s backbone structure or fold by comparing its amino acid sequence with solved structures already in the international depository Protein Data Bank and then assessing how well it fits from the potential energy point of view. Within hours to a couple of days of computing time, the method can predict a backbone structure by selecting the placement with the best assessment score. Existing threading techniques are thought to be capable of solving 60% to 70% of proteins identified through the genome projects.

Although there could be millions of proteins in nature, the number of unique structural folds could be as few as 1000, as many structural biologists believe. Up to now, more than 12,000 protein structures have been determined experimentally and deposited into PDB. Among these proteins, about 700 have unique structural folds. If estimates are correct, about 70% of all structural folds have been calculated. Statistics from PDB submissions are consistent with this hypothesis; over 90% of protein structures solved in the past 3 years have similar structures in PDB. Scientists have found more efficient ways to calculate a protein structure by making use of this information. Back

CASP Competition for Protein Structure Prediction
To assess objectively the state of the art in prediction tools for protein structures, the computational structural biology community agreed on an evaluation system called CASP (Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction). CASP was initiated in 1994 by John Moult (National Institute of Standards and Technology and University of Maryland). CASP has been a biannual event since its inception, with each event lasting about 4 months. In each CASP, participants were given a few dozen protein sequences whose structures had been solved experimentally and not published; participants could select targets to predict by a certain date. A group of invited assessors evaluated how well each predicted structure matched the experimental structure. At the end of the prediction season, the performance of each team was ranked, and the results were announced at a meeting in Asilomar Center, California. More than 160 international teams participated in CASP4, which ended in December 2000. (http://predictioncenter.llnl.gov) CASP5’s prediction season is expected to begin in May and end in August. Back

R&D Awards
DOE-supported laboratories, facilities, and small businesses claimed more than a fourth (26) of the 2001 R&D 100 awards (www.rdmag.com/). Investigators at Oak Ridge National Laboratory have won a total of 109, placing second only to General Electric. Conducted annually by R&D Magazine, the competition honors the 100 most outstanding new technologies, processes, materials, and software with commercial potential. Entries are judged on technical significance, uniqueness, and promise of real-world application. Back

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v12n1-2).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.