Human Genome News Vol.10, No.3-4, October 1999

Sponsored by the U.S. Department of Energy Human Genome Program

Human Genome News Archive Edition
go to list of issues »

Vol.10, No.3-4 October 1999

In this issue...

Available in PDF

DOE '99 Oakland Highlights

Genome Project

In the News

Microbial Genomics

Ethical, Legal, and Social Issues

Informatics

Web, Other Resources, Publications

Funding

Genome Research Funding

Meeting Calendars & Acronyms

Genome and Biotech Meeting
Training Courses and Workshops
Acronyms

HGN archives and subscriptions

Human Genome Project Information home

Informatics

Report from 1999 DOE Genome Meeting

Oakland presentations emphasized that genome-sequencing projects are producing data at a rate exceeding current analytical and data-management capabilities. Additionally, some current computing problems are expected to scale up exponentially as the data increase.

Genome Annotation Consortium

Ed Uberbacher, Jay Snoddy, and Phil LoCasio (all at Oak Ridge National Laboratory) offered an update on progress at ORNL and the multi-institutional Genome Annotation Consortium (GAC), which was established to address massive computational and informational challenges.

The goals of this work are to develop a system for whole-genome annotation that (1) organizes various types of data around genome frameworks that can be cross-indexed, compared, and cross-navigated and (2) allows multiple analytical methods to be applied to the same data. Steps in the annotation process include the following:

retrieving data and assembling genomes;
computationally finding genes and other sequence-level features;
computationally determining homology, function, and other relationships;
genome-wide structural modeling of gene products;
analyzing and modeling pathways and systems; and
managing, accessing, and visualizing data.

Snoddy, Uberbacher, and LoCasio discussed the growing complexity and expense involved in biological computing for genome assembly and annotation. They noted that assembly problems will increase as billions of nucleotides are entered as draft sequences into the sequence databases by mid-2000, when the daily assembling of new data alone will require over 1600 workstations.

Other significant computational challenges include integrating the major community maps, which often have inconsistences and discrepancies, and performing comprehensive sequence analyses for gene modeling, which requires the time-consuming application of several algorithms. Furthermore, completing some desired analyses for protein classification currently could require about 70 days on a 1024-node processor. Challenges are similar for such other comparative processes as genome-to-genome alignment for studying mouse and human synteny. As sequence numbers and lengths increase, challenges become even greater for making phylogenetic gene and species trees. Meeting these and other high-performance biological computing needs, the speakers emphasized, demands a centralized approach with advanced infrastructure and specialized facilities.

Uberbacher gave an overview of GAC progress in developing tools, servers, and special data views to serve the community. Achievements include establishment of data-acquisition and semiautomated sequence-assembly components and modules that are integrated to allow comprehensive genome-wide analysis. He noted that the computation-based GRAIL-EXP is finding about 10 times more human genes than investigators had identified previously, as indicated in the GenBank annotation. All human and microbial gene-analysis tools are available to researchers. At present, the Oak Ridge group is focusing on urgent annotation challenges from the massive sequencing ramp-up under way at the DOE Joint Genome Institute.

For More Information:

DOE JGI; http://www.jgi.doe.gov
Genome Catalog; http://genome.ornl.gov/GCat/species.html
Genome Channel; http://compbio.ornl.gov/channel/

Multiple Genome Analysis: WIT

A poster by Natalia Maltsev (Argonne National Laboratory, ANL) and colleagues described ANL's WIT system. WIT was designed and implemented to support genetic sequence and comparative analysis of sequenced genomes and metabolic reconstructions from sequence data. It now contains data from 34 genomes (some incomplete).

The authors believe that parallel analysis of a large number of phylogenetically diverse genomes can add much to the understanding of higher-level functional subsystems and major physiological designs. They reported a new method for using conserved clusters of genes from numerous genomes to predict functional coupling between genes. Although early results are encouraging, investigators believe the precision of prediction and the amount of accessible functional coupling will increase dramatically as more genomes are added. They emphasized that this class of data may well become a significant resource for establishing the function of hypothetical proteins, better understanding the functions of paralogous genes, and reconstructing connections in higher-level functional subsystems.

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v10n3-4).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.