Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, July 1990; 2(2)
Evolutionary Improvements and Revolutionary Methods Development
Marvin Stodolsky, Human Genome Program, U.S. DOE
The primary goal of the Human Genome Project is to produce a reference DNA sequence. Although sequencing technologies have recently improved, sequencing the 3-billion-bp human genome will require faster, less expensive, and more accurate systems. The sequencing efforts being funded by DOE can be divided into two categories-evolutionary improvements and revolutionary methods. (See information on requesting copies of research abstracts of work funded by DOE.)
Those methods categorized as evolutionary improvements primarily support Sanger and Maxam-Gilbert sequencing strategies, or their extended versions, in which four families of DNA fragments are produced, each family ending in one of the four bases (A, T, C, or G). Size fractionation by gel electrophoresis produces a sequencing ladder with steps of increasingly longer fragments. The sequence is read as the end base on successively longer fragments within the four ladders.
In contrast, those methods categorized as revolutionary are innovative and high-risk. They rely on novel data-acquisition methods and promise to generate sequence data at higher rates, but they will require substantial technical development before sequence production begins. Both evolutionary and revolutionary systems are undergoing or will undergo validation trials to determine their accuracy, speed, and total-system economics.
For current gel electrophoretic fragment separation systems, the capacity for spatially resolving successively longer fragments is diminished at lengths of several hundred base pairs. This decrease in resolution limits accurate reading of sequence from the sequencing ladder. Use of capillary or ultrathin slab gels offers promise that higher resolution will extend sequence reads above a thousand bases. Thin gels also rapidly dissipate ohmic heating and allow use of higher voltages to result in faster separation. A tenfold increase in fractionation speed is expected, with attendant increased DNA throughput.
To visualize the sequencing ladders, a variety of DNA labels have been used, including radioisotopes, stable isotopes, fluorescent compounds, and chemiluminescent materials. Replacement of radioisotopes with less hazardous labels is highly desirable. Labels may be attached to DNAs being fractionated, or they may be attached to oligomers (short, synthetic DNAs), which can then serve as probes that will selectively bind to their complementary sequences in a ladder and thus display their positions.
For sequences labeled by radioisotopes or chemiluminescence, the ladder images are first captured on film. Several versions of scanning densitometers and charge-coupled display (CCD) cameras are being evaluated for conversion of images into computer-manipulable data.
In support of systems using fluor labels for either mapping or sequencing, more sensitive instruments for detecting in-gel fluorescence are being developed. Decreasing the concentration of DNA in a gel provides some increase in resolution of successive fragment bands. Sensitivity gains are particularly important for analyses on capillary or ultrathin slab gels with their reduced DNA loading capacities.
Stable isotopes, four or more per element (e.g., tin), are particularly promising for multiplex sequencing or blotting procedures. When the isotope is incorporated into the primer used in Sanger or polymerase chain reaction (PCR) procedures, the electrophoretically produced DNA bands are located by scanning the gel with a resonance ionization spectrometer coupled to a time-of-flight mass spectrometer. The A, G, C, and T Sanger fragments, each labeled with an individual isotope, are run in a common gel lane; such fragments from several DNAs can also be combined into that same gel lane, as long as they carry distinguishable stable isotopes. Analysis rates of 3000 to 10,000 DNA bands per minute are suggested.
Today, physical maps of chromosomes are being generated as ordered libraries of cosmids or yeast artificial chromosomes (YACs). Several different strategies can be used to obtain the sequence of these large cloned DNAs when unit reads are only several hundred base pairs long.
In directed strategies, an initial mapping effort is necessary to identify suitable sites of sequencing initiation. They are chosen so that unit reads of DNA will overlap and permit extended sequence assembly by aligning the overlaps. In shotgun methods, initiation sites are randomly chosen. More total sequencing is necessary to guarantee overlaps of unit reads. There are also hybrid strategies that begin with shotgun sequencing and finish through some directed sequencing. PCR can serve to amplify and identify DNA regions needed for completion of an extended sequence.
Multiplex sequencing promises considerable increase in efficiency, because target DNAs are processed in pools of 20 or more sequences. Each individual member of a pool has a distinguishing tag sequence at the beginning of its sequencing ladder. Several processing steps culminate in the binding of fractionated fragment patterns to a membrane. Thereupon, each superimposed sequencing ladder yields its discrete sequence data as hybridizations are performed with oligomer probes complementary to each distinguishing tag.
The cost of generating an extended sequence resides mainly in its preparation (e.g., reduction of cosmids or YACs into needed small recombinant DNAs, restriction fragments, or PCR fragments). Reduction of front end costs is thus important for total system economics. Automation in clone management, subcloning, and DNA preparation is progressing. Some newer sequencing strategies do not require the production of subclones from cosmids or perhaps even from YACs.
Transposable genetic elements with features that facilitate mapping and sequencing are being constructed. Within host bacteria, the transposon randomly integrates at many sites within the human DNA cosmid inserts, and therefore many initiation sites are provided for sequencing. The DNA segments between integrated transposons must first be isolated so that each segment provides a unique sequencing template. Sequences generated from multiple insertion sites should overlap and provide sequence coverage for the entire human segment of the cosmid.
Similar cosmid coverage is achieved by using a family of oligomers to provide primers for Sanger sequencing. Each sequencing run identifies oligomer family members that can next be used as primers, so that spans of contiguous sequence are progressively extended. Eventually the spans overlap, and the sequence of the human segment within the cosmid is completed.
Data analysis is a major bottleneck in all contemporary sequencing projects. The read accuracy of sequence data varies; the shorter the fragment being read on the ladders, the greater the accuracy of sequence data obtained. This data must be computationally processed to recognize overlaps of sequence reads so that extended sequence can be assembled. This assembly task is currently being addressed by development of special algorithms. The output provided for refined displays of probable overlaps of reads also aids and speeds any necessary human decisions.
Revolutionary sequencing methods are diverse in operational principle and instrumentation employed. The DOE-funded projects described below demonstrate this diversity.
This technology uses a unique strategy and various computer algorithms to assemble extended sequence from very short sequences. Some 100,000 oligomer probes are used to test for the presence of complementary sequences in recombinant DNAs of the single-stranded virus M13. The library of M13 recombinants has a manyfold representation of the region to be sequenced. Short sequence data obtained with probes first serve to order overlapping clones of the library; then extended sequence is assembled from the short sequence data. When false branchings in sequence within a single clone do arise, they are eliminated by comparing results from several clones that partially overlap the branch region. Proof-of-concept sequencing has been performed successfully, with a limited family of oligomers, on a 100-base-long M13 interferon DNA.
Some revolutionary approaches seek to achieve sequence acquisition from a single strand of DNA. The successive-base-release approach utilizes automated flow cytometry technology. In a preparative step, one strand of a duplex is partially degraded and then replaced, through DNA polymerase action, with base subunits labeled with distinguishing fluors. The labeled DNA is mounted in a quartz capillary tube, together with a suitable processive exonuclease. Bases sequentially released into the flow stream would then be identified by their distinguishing fluor labels. A detection system capable of detecting and identifying single molecules is essential to this effort.
Since the January 20, 1989, Science report on STM of DNA, this sequencing approach is being pursued in many laboratories. The report resulted from a collaboration between Lawrence Berkeley Laboratory (LBL) and Lawrence Livermore National Laboratory (LLNL).
In STM technologies a tip approaching atomic dimensions scans the specimen surface nondestructively. Atom-scale resolution of simple surfaces has been achieved with this approach. With DNA specimens, the objective is to distinguish the four bases on the sugar-phosphate backbone of DNA.
The newest member of the scanning microscopy family is molecular exciton microscopy. In this technology, the energy exchange and modulation between a near-field optical scanning tip and specimen is sensitive to the detailed nature of electronic orbitals in the specimen. For DNA analyses, differences between bases would be enhanced by attachment of distinguishing metal labels.
Recent development of intense, coherent (laser) X-ray sources and high-quality X-ray optics may result in X-ray microimaging capabilities with sufficient spatial resolution to define base sequence. In principle, a single strand could provide sufficient data to reconstruct a holographic image of the molecule. Very high performance and a means to enhance contrast between the bases would be essential for all subsystems. Radiation damage to the DNA will be a particular problem to address.
In a less demanding quasi-crystallographic approach, a chromosome segment is amplified by PCR, and one of the base types labeled with a heavy metal. Samples containing as few as 10 million labeled and fully extended DNA molecules might suffice as a target. Theoretical analysis shows that by using a coherent X-ray source, the positions of metal labels, and hence bases, on the DNA fragment can be determined from scattering data. For each chromosome segment, four fibers corresponding to individual labeling of A, T, C, or G would be necessary. Sequence would then be obtained by integrating the four sets of position data.
With many evolutionary improvements in progress and with the revolutionary schemes now under development or not yet imagined, prediction of future genome sequencing technologies is difficult. To better assess progress and coordinate research in the sequencing portion of the nation's Human Genome Project, the DOE Human Genome Program and the NIH National Center for Human Genome Research have organized the Joint Working Group for DNA Sequencing.
Abstracts of all projects mentioned are contained in the DOE Human Genome 1989-90 Program Report, except for newly funded sequencing projects on transposon-aided sequencing, an improved CCD optical system for flow cytometry, exciton microscopy, and coherent X-ray crystallography of DNA fibers. Copies of the program report and these abstracts may be obtained from HGMIS.
The electronic form of the newsletter may be cited in the
Human Genome Program, U.S. Department of Energy, Human Genome News (v2n2).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.