HGPI

Human Genome Project Information Archive
1990–2003

Archive Site Provided for Historical Purposes


Sponsored by the U.S. Department of Energy Human Genome Program

Human Genome News Archive Edition
go to list of issues »
Vol.9, No.3   July 1998

In this issue... 

Also available in pdf.

1997 Santa Fe Highlights

Human Genome Project Administration

In the News

Publications

Software and the Internet

Funding

Meeting Calendars & Acronyms

  • Genome and Biotechnology Meetings 
  • Training Courses and Workshops 
  • Acronyms

HGN archives and subscriptions
HGP Information home

Sequencing Strategies and Tools

Meeting Human Genome Project sequencing goals on time and within budget will require major improvements in speed, reliability, and costs. Last year, the most efficient sequencing centers achieved outputs of around 20 to 25 Mb at a cost of about $.50/bp, with a total community-wide sequencing capacity close to 100 Mb/year. Projections are that several hundred megabases of finished sequence will have to be generated each year to meet goals by 2005. Workshop speakers presented some successes and challenges they are encountering.

Large-Scale Projects Shawn Iadonato [University of Washington (UW)] discussed the advantages of using detailed sequence-ready restriction maps produced with a high-resolution, multiple complete-digest method. He presented data from a large-scale project, based on Eric Green's (NIH) STS and YAC-based map, to sequence a contiguous 2-Mb human chromosome 7q31.3 region. Advantages of investing in upstream mapping, he noted, include (1) clone validation, (2) assembly checking, and (3) optimal tiling path. In addition, UW emphasizes high-quality raw and finished sequence data.

Iadonato also suggested a way for large-scale sequencing facilities to measure sequencing cost and efficiency. He divided the factors into three areas: development activities universally applicable to the genome community, typically the smallest investment; production-related development that enhances a particular facility's efficiency; and production of mapping and sequencing data, generally the largest investment. The real measure of cost efficiency, he said, is the number of finished base pairs per read of generated data; at the UW center it fluctuates between 45 and 55 bp.

Owen White [The Institute for Genomic Research (TIGR)] reported on high-throughput microbial sequencing projects at TIGR and on implementation of enhanced data-annotation techniques. "Biology is not just data acquisition," he reminded attendees; "it also attempts to draw relevant conclusions." White observed that sequence generation and annotation are coupled tightly and that people generating sequence should be enhancing their submissions with data such as database and orthology matches. (Orthology refers to genes occupying the same genetic locus in different species.) He noted that "orthologs are the central kernel of information we will be using instead of individual genes." Other possible annotation includes frame shift analysis, laboratory management database systems, noncoding information such as DNA repeats and regulatory regions found upstream of a gene, and literature citations. Because an increasing number of genome sequences are coming online, a robust, flexible system of data management across genomes will be needed to handle the numerous kinds of data. Development of such a system will enable new entries to update the annotation of other genomes as applicable.

Chemistries, Strategies, Technologies
Current DNA sequencing methods use a DNA polymerase to extend a primer in the presence of the four natural nucleotides. Two important polymerase properties are its ability to incorporate dideoxynucleotides onto a growing DNA strand and the length of time the polymerase remains associated with the DNA template (known as its processivity). For 12 years, Stanley Tabor (Harvard Medical School) and colleagues sought to capture a picture of the replicating complex in action; their efforts were rewarded last year with an elegant determination of a T7 DNA polymerase structure at a 2.2-Å resolution. The T7 is locked in a replicating complex with a dideoxy-terminated primer-template, an incoming dNTP, and the processivity factor thioredoxin. The work was reported in the January 15, 1998, issue of Nature.

"The structure has been a gold mine for helping us understand the polymerization mechanism and for facilitating further studies to define critical features that will enable more precise engineering of mutant polymerases with enhanced properties," Tabor said. The group's past successful applications of structural studies include development of an improved polymerase, now commercially available, which reduces the amount of expensive reagents required and supports popular cycle-sequencing protocols.

John Dunn (Brookhaven National Laboratory) reported on the development of vectors and protocols to allow simple and reliable production of nested deletions for rapidly sequencing across one strand of a cloned fragment using a universal primer. The strategy has advantages for sequencing gaps and repetitive DNA. Exploratory work has demonstrated its effectiveness in sequencing fragments at least as large as 17 kb, cloned from a human BAC. Imaging and sizing software is being tested for automated selection of an appropriate set of deletions for sequencing.

Alex Glazer [University of California, Berkeley (UCB)] discussed improvements resulting from the use of energy-transfer (ET) fluorescent primers for DNA sequencing and analysis. Fluorescent labels are critical components of conventional automated sequencing approaches, and ET primers provide more distinct and intense fluorescence emissions than single dye-labeled primers. This improvement has led to significant advances in DNA sequencing and analysis, including short tandem repeat (STR) typing often used in diagnostics and forensics.

Glazer described a collaboration that he and Richard Mathies (UCB) have with David Sidransky (Johns Hopkins University) in which two-color ET primer sets are applied to bladder-cancer diagnosis. The technique is based on electrophoretic analyses of PCR-amplified STRs from bladder epithelial cells shed in the urine. Diagnosis depends on detecting loss of heterozygosity (variation) at particular loci, and multiplex analyses allow quantitative determination of amplified fragments from two different samples (normal and tumor cell). The noninvasive assay facilitates the monitoring of surgery's effectiveness in eliminating cancer cells and the detection of a relapse.

Significant increases in sequencing throughput can be achieved using higher electric fields in the fragment-separation (electrophoresis) step. Although conventional slab gel systems retain too much heat under these conditions, sets of gel-filled glass capillaries that dissipate heat more efficiently are being developed as an effective alternative to conventional methods. Another advantage to capillary systems is the potential for eliminating the labor-intensive gel-pouring and -loading steps.

Several groups discussed advances in various approaches that use many capillaries in parallel [called capillary array electrophoresis (CAE) systems]. Barry Karger (Barnett Institute, Northeastern University) discussed the use of replaceable polymers and capillary electrophoresis with high-resolution automated fraction collection for picking out differentially expressed mRNAs or cDNA systems. Indu Kheterpal (UCB) reported progress in developing a second-generation CAE scanner with Ron Davis (Stanford University) that can detect up to 1000 capillaries in an array. Jian Jin [Lawrence Berkeley National Laboratory (LBNL)] described a beta test version of a 96-well capillary system that employs a sheath-flow excitation-detection geometry. A prototype of a fully automated 96-well system is ready for testing, according to Qingbo Li (SpectruMedix Corporation, formerly Premier American Technologies Company). Ed Yeung (Iowa State University) won an R&D 100 award for developing the technology.

Chip-based CAE approaches are being explored by groups such as the team led by Mathies. Their device, featured in the February 15, 1998, issue of Analytical Chemistry, combines an electrophoretic injection and separation system with an electrochemical detector in a microfabricated apparatus. The technology represents the first example of integrating onto a single chip a miniaturized detection system with injection and separation components of an electrophoretic chemical-analysis system.

Sequence-Ready Map Strategy
Because of their higher stability as compared with their YAC or cosmid counterparts, clone libraries constructed in BAC, PAC, and P1 vectors have become the choice for clone sets in high-throughput genomic sequencing projects. A strategy was proposed in 1996, and pilot projects were begun for using end sequences from BACs or PACs to support just-in-time contig extension for directed sequencing [Nature 381, 364-66 (1996)]. The strategy requires collection of end sequences from clones representing a 15-fold coverage. DOE-funded pilot projects are being carried out at TIGR and UW.

Mark Adams (TIGR) provided a progress report on a pilot BAC end sequencing (BES) project to explore the strategy's feasibility, optimize technologies, establish quality controls, and design the necessary informatics infrastructure. Adams reported a success rate of around 75% and, using four ABI 377 sequencers, daily production of about 400 high-quality BAC end sequences having an average edited length of about 475 bases. Researchers are running into some large duplicated regions in the chromosome 16 end-sequencing project (about 40 kb in one BAC), and Adams stressed the importance of understanding the targeted region's genomic structure. So far, about 20% has been accomplished toward the goal of 15-fold genome coverage. Details of prep and sequencing methods are on the TIGR Web site (www.tigr.org), and all components are available commercially. Data from a BES companion project led by Gregory Mahairas (UW) are also on the Web (updated information on BES and BAC-PAC resources:http://www.ornl.gov/meetings/bacpac.pdf).


Return to Top of Page

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v9n3).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.