Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, January 1998; 9:(1-2)
The Fifth International Conference on Intelligent Systems for Molecular Biology held June 21-25, 1997, in Porto Carras, Greece, ended with a workshop on Automatic Annotation of Genome Sequence Data.
Automatic annotation of large amounts of genomic DNA sequence clearly is and will continue to be a formidable challenge. When completed, the human genome sequence will consist of 24 strings of As, Ts, Cs, and Gs with a combined length of 3 billion characters. Without marking the locations of such biologically important parts of the sequence as the genes and their regulatory elements, this string of characters has little usefulness. Annotating the genome sequence in parallel with its determination is critical.
Attendees felt this problem will be addressed properly only by developing very efficient computational tools for initial sequence annotation, treating the annotations as hypotheses, and testing and verifying them in the laboratory. Additionally, for maximum usefulness, the generated annotation results must be stored in an easily retrievable and queryable form in well-curated databases. The "If you sequence it, the community will annotate it" approach is unlikely to produce desired results, and new paradigms and possibly new organizational models will be needed to present genomic sequence in its most useful form.
Eight workshop speakers addressed the challenges and technologies in automatic annotation and the most efficient division of labor between biology and computer science.
Introductory remarks by session chairman Chris Sander [European Molecular Biology Laboratory--European Bioinformatics Institute (EBI)] made clear that no one yet has the experience to know the right way to proceed with automatic annotation. Richard Durbin (Sanger Centre) stressed an often-repeated theme that proper annotation will require wet-laboratory work as well as computational annotation. He also stressed the need for curated databases. Michael Ashburner (EBI) discussed his experience in annotating Drosophila sequences and the need for hierarchial controlled vocabularies. He suggested the possibility of an annotation database that would be separate from but seamlessly linked to the sequence databases.
Three other speakers addressed general problems in genomic-sequence annotation: Antoine Danchin (Institut Pasteur) discussed annotation of the Bacillus subtilis genome, Terry Gaasterland (Argonne National Laboratory) described annotating microbial genomes, and Chris Overton (University of Pennsylvania) shared experiences from a project to annotate genomic sequence from human chromosome 22. Other speakers discussed annotation efforts and tools being developed in the bioinformatics industry. [Richard Mural, Life Science Division, Oak Ridge National Laboratory,firstname.lastname@example.org]
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v9n1).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.