Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
In this issue...
HGP and the Private Sector
In the News
Ethical, Legal, and Social Issues
Web, Publications, Resources
Meeting Calendars & Acronyms
Human Genome Project FAQs
Working Draft vs Finished Sequence: What's the Difference?
In generating the draft sequence, scientists determined the order of base pairs in each chromosomal area at least 4 to 5 times (4x to 5x) to ensure data accuracy and to help with reassembling DNA fragments in their original order. This repeated sequencing is known as genome "depth of coverage." Draft sequence data are mostly in the form of 10,000 basepair-sized fragments whose approximate chromosomal locations are known.
To generate high-quality sequence, additional sequencing is needed to close gaps, reduce ambiguities, and allow for only a single error every 10,000 bases, the agreed-upon standard for HGP finished sequence. Investigators believe that a high-quality sequence is critical for recognizing regulatory components of genes that are very important in understanding human biology and such disorders as heart disease, cancer, and diabetes. The finished version will provide an estimated 8x to 9x coverage of each chromosome. Thus far, finished sequences have been generated for only two human chromosomes --21 and 22.
When is a Genome Completely Sequenced?
In December 1999, the 56-Mb sequence of human chromosome 22 was declared essentially complete, yet only 33.5 Mb were sequenced. In early spring of this year, the fruit fly Drosophilas 180-Mb genome also was announced as completed, although just 120 Mb were characterized. Whats the deal?
Animal genomes have large DNA regions that currently cannot be cloned or assembled. In the human genome sequence, these regions include telomeres and centromeres (chromosome tips and centers), as well as many chromosomal areas packed with other types of sequence repeats.
Most unsequenceable areas contain heterochromatic DNA, which has few genes and many repeated regions that are difficult to maintain as clones for DNA sequencing. HGP scientists strive to sequence the entire euchromatic DNA, which generally is defined as gene-rich areas (including both exons and introns) that are translated into RNA during gene expression. In the case of human chromosome 22, the sequenced 60% represents 97% of euchromatic DNA. Similarly, nearly all the euchromatic regions were sequenced for Drosophila.
Although the HGP goal is to have complete strings of sequence for each chromosome from tip to tip, obtaining this high level of resolution presents a great challenge.
Whose Genomes Are Being Sequenced?
Diversity Represented All humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of biological structures and processes. Therefore, the human reference sequence will not, and does not need to, represent an exact match for any one person's genome.
Investigators are using DNA from donors representing widely diverse populations. For example, HGP researchers collected samples of blood (female) or sperm (male) from a large number of people; only a few samples were processed, with source names protected so neither donors nor scientists know whose genomes are being sequenced. The private company Celera Genomics collected samples from five individuals who identified themselves as Hispanic, Asian, Caucasian, or African-American.
In addition to generating the reference sequence, another important HGP goal is to identify many of the small DNA regions that vary among individuals and could underlie disease susceptibility and drug responsiveness. The most common variations are called SNPs (single nucleotide polymorphisms). The DNA resources used for these studies came from 24 anonymous donors of European, African, American (north, central, south), and Asian ancestry.
Although the sequence information will come from the DNA of many persons, it will be applicable to everyone.
DOEs role in the HGP arose from the historic congressional mandate of its predecessor agencies (the Atomic Energy Commission and the Energy Research and Development Administration) to study the genetic and health effects of radiation and chemical by-products of energy production. From this work the recognition grew that the best way to learn about these effects was to study DNA directly.
The electronic form of the newsletter may be cited in the following style:
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.