March 24, 2000
Fruit fly's genome sequence finished in record time: Physical maps are the key to accuracy
BERKELEY, CA -- In 90 years of study, the diminutive fruit fly Drosophila melanogaster has yielded many of the most fundamental discoveries in genetics -- beginning with proof, in 1916, that the genes are located on the chromosomes. Only during the last year has the fly's whole genome been sequenced, however, and its 13,601 individual genes enumerated.
The genome of D. melanogaster, the largest yet sequenced in full, is described in the 24 March 2000 issue of Science magazine, in a series of articles jointly authored by hundreds of scientists, technicians, and students from 20 public and private institutions in five countries.
The collaboration was led by Gerald Rubin of the University of California at Berkeley and the Howard Hughes Medical Institute (HHMI), who heads the Berkeley Drosophila Genome Project, and by J. Craig Venter of Celera Genomics in Rockville, Maryland. The Berkeley Drosophila Genome Project (BDGP) is supported by the Department of Energy, the National Human Genome Research Institute, and HHMI, with the largest of its facilities operated by the Life Sciences Division of the Department of Energy's Lawrence Berkeley National Laboratory.
In 1998, when collaboration with Celera began, extensive but incomplete maps of the location of specific DNA sequences on the fly chromosomes had been constructed, and about 20 percent of the fly genome had already been sequenced in detail -- mostly by the BDGP group at Berkeley Lab where, with Rubin, Susan Celniker is co-director of the sequencing effort.
The purpose of the collaboration was to test whether a strategy known as whole-genome shotgun sequencing could be used on organisms having many thousands of genes encoded in millions of DNA base pairs; the strategy had proven effective for small bacterial genomes.
"No one knew whether whole-genome shotgun sequencing would work for the fly genome," says Roger Hoskins, leader of the BDGP physical mapping project, "but we knew that if it did, it would be faster and more efficient than traditional methods."
D. melanogaster has some 250 million bases in its genome, arranged on five chromosomes; 80 percent of the genome is located on the large chromosomes labeled 2 and 3. Hoskins and his colleagues set out to produce a physical map of that part of chromosomes 2 and 3 that expresses genes (about 45 percent of the chromosomal material is highly condensed and does not encode genes).
Although physical maps are not sequences -- a sequence identifies every pair of bases along a given stretch of DNA -- a good map pins down the location of unique short sequences that can be used to establish the correct long-range order of copies of longer DNA sequences, and thus of any genes they represent.
The 17,000 clones used by the Berkeley Lab BDGP group are actual stretches of DNA replicated in Escherichia coli bacteria and known as "bacterial artificial chromosomes" (BACs). Each BAC accurately represents a discrete stretch of the genome, and the map marks each BAC with at least one unique "sequence-tagged site" (STS) -- ideally with two or more such sites.
Using probes tailored to each sequence-tagged site, an STS can be found wherever it occurs in a random collection of clones; 1,923 of these markers, spaced roughly every 50,000 bases, were used to build the BDGP's final map. By matching these sites among overlapping clones, sets of clones of different lengths can be lined up with one another and eventually "tiled" along the entire length of each chromosome. The result is called an STS content map.
When their map of chromosomes 2 and 3 was complete -- along with maps of the much shorter chromosomes 4 and X produced by others -- the BDGP researchers made a "rough draft" sequence of the genome with shallow coverage (less than two clones deep), which served as a check against Celera's whole-genome shotgun sequence and is being used to close some of its 1,600 gaps.
The multi-author Science paper summarizing the genome-sequence results describes the importance of the BDGP's methods and results: "The BAC end-sequences and STS content map provided the most informative long-range sequence-based information at the lowest cost."Increasing the number of BAC end-sequences is the authors' primary recommendation for future genome-sequencing projects.
D. melanogaster's importance is far greater than as a trial run for the mouse and human genome, however. In a set of 289 human genes implicated in diseases, 177 are closely similar to fruit fly genes, including genes that play roles in cancers, in kidney, blood, and neurological diseases, and in metabolic and immune-system disorders. "The underlying biochemistry of fruit flies and humans is remarkably similar," says Hoskins, "so fruit flies can provide clues to understanding human diseases caused by defective genes."
"We can find human tumor-suppressing genes in flies easier than we can in the mouse," says Susan Celniker, pointing out that experiments can be done using fly genes that would be impractical (or unthinkable) using human subjects. Especially useful is the identification of networks of other genes that interact with known disease genes, and their associated metabolic pathways. The implications for medicine are immediate.
To this end the BDGP researchers are continuing to refine the D. melanogaster sequence already produced. "We're going to push it to high accuracy," says Hoskins.
The Human Genome Project aims for a resolution of one error in 10,000 base pairs -- roughly the number of errors that could arise from normal human variation -- but the Drosophila workers intend to achieve an accuracy of one error in 100,000, a goal partly made possible by the limited variation among inbred laboratory flies.
Meanwhile the completed genome of D. melanogaster reported in the 24 March 2000 issue of Science stands as a milestone in the history of genetic research and a doorway to new methods of progress. For one thing, Celera is now attempting to apply the whole-genome shotgunning technique to the much larger human genome.
"Celera did a great job," says Hoskins, "and the project worked better than anyone could have hoped. Now, the BDGP and the rest of the community of 5,000 Drosophila researchers around the world can begin projects to understand how the genome sequence controls the biology."
The Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California.
For more information contact:
- Paul Preuss, Lawrence Berkeley National Laboratory 510/486-6249
- Celera Genomics, 240/453-3000
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.