Archive Site Provided for Historical Purposes
Sponsored by the U.S. Department of Energy Human Genome Program
In this issue...
In the News
Ethical, Legal, and Social Issues and Educational Resources
Genetics in Medicine
Web, Other Resources, Publications
Meeting Calendars & Acronyms
Ultimate goals of the Human Genome Project (HGP) are to determine the sequence of the 3 billion DNA bases that make up the human genome and to increase understanding of gene function. In search of the best route to these ends, researchers have generated several different types of useful chromosomal maps. Eventually, the human genome will be represented by DNA chromosome sequences with various levels of annotation.
Interim maps have proven useful for biomedical research, but the most valuable map resources for production DNA sequencing are megabase-scale assemblies of overlapping DNA clones (contigs). Building long contigs, however, has proven a difficult task. Although contig maps of chromosomes 16 and 19 (developed at Los Alamos and Lawrence Livermore national laboratories, respectively) were largely complete in 1995, comparable contig maps of other chromosomes are less ready to support high-throughput sequencing. To help alleviate this impending bottleneck, in 1998 DOE sponsored projects to enrich the BAC clone resources preferred for high-throughput sequencing systems.
BACs and STCs
BACs, which typically contain 100- to 200-kb inserts of human DNA, were designed as larger, more stable recombinant DNA clones that would represent the human genome more uniformly than previous systems. BAC development was pioneered with DOE support by Melvin Simon's team at the California Institute of Technology, with Pieter de Jong of Roswell Park Cancer Institute contributing to subsequent improvements.
Recent DOE-sponsored projects are producing sequence tag connectors (STCs) on BACs to help extend the human chromosome sequence already acquired. STCs are DNA sequence reads at both ends of the BACs. In 1995 and 1996 investigators began to advocate that the STC concept, which had proven useful in smaller-scale sequencing projects, be applied to large-scale human genome sequencing (Venter et al., Nature 381, 364-66). DOE accepted related applications in 1996 and implemented a fast-track, special review process involving a panel composed of international experts in human and mouse genetics, mapping, sequencing, informatics, and management. Following the panel's recommendations, in September 1996 DOE initiated pilot projects at six laboratories to refine protocols and clarify cost and quality factors.
Several months later in 1997, a workshop and review was held to assess progress. Attendees recommended that DOE maintain its level of support at about $5 million a year. They also suggested concentrating STC production at sites that achieve the highest-quality sequence reads to allow the design of valuable STSs (see Mapping with STCs and STSs).
High-throughput STC production is now being carried out at The Institute for Genomic Research (TIGR) under Bill Nierman and at the University of Washington Department of Molecular Biology (UWMB) by Gregory Mahairas of Leroy Hood's team. These sequencing projects are slated for completion in late 1999, with STC data sets on some 450,000 BACs. As of February 1999, more than 378,000 STCs had been acquired at the two sites (see BAC Projects).
STC data will provide researchers with an STC marker spaced an average of every 3000 to 4000 bases across the entire human genome, a 100-fold improvement over other current human genome maps.
The availability of STC data sets encourages more participation by smaller laboratories. Their contig building has been hindered previously by the prohibitive cost of maintaining and processing libraries on the human genome scale. With the number of STC data sets now expanding, BACs to extend chromosomal sequence can be screened computationally over the Internet. Scientists need to order only those BACs identified as candidates for contig extension (see Availability of BAC Clones and STC Data).
Enriching STC Data
Teams at UWMB and CalTech are generating additional enrichments to core BAC-STC data sets. Restriction fingerprints, which are useful for validating candidate contig extensions, will be available from UWMB for most BACs processed there. At CalTech, a team led by Ung-Jin Kim is correlating BACs with cDNAs from the NCBI UniGene* listing. These correlations will allow concurrent sequencing of chromosomal regions and their derivative cDNAs, thus promoting the interpretion of sequence function. If cDNAs already have been mapped via expressed sequence tags (ESTs) to particular chromosomal regions, their correlated BACs also will be assigned candidate positions on the chromosomes. In addition, some STCs are being used to design STSs that are useful in other mapping methods (see Mapping with STCs and STSs).
STCs Useful in Non-HGP Efforts on Human, Other Genomes
Several smaller-scale mapping and sequencing projects have adapted STC concepts since the HGP began. STC data sets either are in use or planned for genome projects in other species, including those for the flowering plant Arabidopsis thaliana and the laboratory mouse. Other examples include microbial genome sequencing strategies using STCs developed at TIGR with DOE support. The private company Celera Genomics is planning to use a similar strategy to sequence the human genome [HGN9(3), 1].
*UniGene lists entries for nonredundant EST sequences read from the ends of cDNA clones generated and arrayed for wide distribution by the international I.M.A.G.E. Consortium [HGN6(6), 3]. I.M.A.G.E. clone libraries are an outgrowth of a 1991 DOE initiative to enrich the developing human genome physical maps with gene loci and open broad access to the resulting data and resources.
See graphic, BAC End Sequencing Extends Clones.
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v10n1-2).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.