Archive Site Provided for Historical Purposes
Sponsored by the U.S. Department of Energy Human Genome Program
In this issue...
Also available in pdf.
1997 Santa Fe Highlights
Human Genome Project Administration
In the News
Publications
Software and the Internet
Funding
Meeting Calendars & Acronyms
Sharply increasing rates of sequence-data production are placing greater and greater demands on information systems for new ways to view and better understand the meaning of the growing strings of As, Cs, Ts, and Gs piling up in GenBank and community databases (see article.). Enriching data with such information as gene features and locations, gene-control regions, related sequences, gene-expression patterns, gene and protein families, pathways, and phenotypes can help pave the way for a successful transition from the current structural genomics phase of DNA mapping and sequencing to functional genomics studies.
Genome Annotation Consortium
Ed Uberbacher (Oak Ridge National Laboratory) described several pilot projects in the multi-institutional Genome Annotation Consortium, which was established to minimize some problems posed by genome-scale sequencing and to build a shared infrastructure for integrating diverse biological information. Four basic components of the pilot projects are daily sequence and biological data acquisition from 19 major genome centers; automated data analysis to link biological information to sequence using tools for exon prediction, gene modeling, and sequence comparison; a storage, maintenance, and update component; and a series of methods for browsing, querying, and accessing other tools of value to researchers. An important goal is to build a level of interoperation using CORBA, which has not yet been implemented into the system.
In outlining some current challenges in sequence annotation, Uberbacher noted that no community-wide annotation processes exist and that much of the annotation does not describe the methods and evidence used to create the data. Moreover, even if the sequence were annotated extensively when submitted to the database, long-term update and maintenance are challenges. New ESTs that may be important to understanding a genomic region of interest, for example, may have been entered into the dbEST database but are not represented in the original annotation. Annotation by end users is difficult because it requires multiple tools that use different formats and lack interoperability.
Genome Channel
Uberbacher also demonstrated the Genome Channel, a prototype graphical user tool for browsing and querying the annotated reference genome (http://compbio.ornl.gov/tools/channel/index.html). The Java interface relies on a number of underlying data resources, analysis tools, and data-retrieval agents to provide an up-to-date view of genomic sequences as well as computational and experimental annotation. Designed to be simple enough for a layperson, the channel also offers sophisticated capabilities for hypothesis testing. The system had about 6000 GRAIL-EXP and 4000 GENSCAN predicted human genes as of June.
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v9n3).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.