Human Genome Project Information Archive

Archive Site Provided for Historical Purposes

Sponsored by the U.S. Department of Energy Human Genome Program

Human Genome News Archive Edition
go to list of issues »

Human Genome News, October-December 1996; 8:(2)

Maximizing the Value of Sequence Data

Annotation Critical, says Branscomb

"Obtaining a functional understanding of sequence data is truly a profound challenge," Elbert Branscomb said recently. One method for determining the function of anonymous stretches of sequence data is by computer analysis. Some programs like GRAIL (Gene Recognition and Analysis Internet Link) help detect certain kinds of functional features in sequence data, while others (e.g., BLAST) allow searching for homologies to sequences of known function. This can be done in a systematic and fairly automated way, Branscomb observed, noting the work of a new annotation consortium headed by Edward Uberbacher (Oak Ridge National Laboratory).

"But the highest payoffs and yet the most difficult path to functional understanding," he continued, "come not from anything you can get by computer analysis but from added biological experimentation such as expression analysis. It would be of - tremendous value to be able to search for all genes known to be expressed only in the liver or in the forebrain or in early gestation, and so on. This kind of data can be acquired systematically in different ways."

"People are struggling hard to automate the capture of all sorts of related expression-type data," he pointed out. "Some of these approaches compare expression patterns of different genes in cells and tissues as a function of a physiological condition. For example, you could assess the expression pattern of all genes in a certain cell type and compare that with the same cell type after it has undergone some stage of carcinogenic induction. Researchers at the National Cancer Institute, under the direction of Richard Klausner, are trying to develop technologies and databases that collect these data for investigators."

All this makes the Human Genome Project but a prelude to the insights beckoning just beyond the horizon. "We are just chipping a hole into the sarcophagus of knowledge and peering into the darkness," Branscomb said.

But for any kind of annotation to be useful, he emphasized, it must be stored robustly in a computer-searchable way. This has been one of the most difficult problems to approach, and one that many people are trying to address. Another issue is whether and how sequence annotation should be added to a database after the original sequence submission, either by the submitting lab or by others who find out information about that sequence. Ex post facto annotation not only should be allowed, said Branscomb, it should be made easy and automatic. "It changes the flavor of the databases, making them less archival and more a dynamic record of current knowledge," he said. Both Genome Database and Genome Sequence Data Base have recently introduced promising new schema to support this type of after-the-fact and third-party annotation. GenBank and others are approaching the same basic user needs in interesting ways as well, he noted.

[HGMIS staff]

Return to the Table of Contents

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v8n2).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.