Date Published: March 1994
We proposed DNA sequencing by hybridization (SBH) in 1987. Steady progress in research and theory, including the sequencing of an unknown short (343-bp) DNA by this method, opens the way for rapid development and laboratory-scale implementation of the SBH approach. To achieve our research objective of developing potential daily SBH rates of up to 1 Mb per laboratory, we are exploiting SBH Format 1, in which DNA samples arrayed on a surface are sequentially interrogated by oligonucleotide probes.
This strategy is based on the development of a high-throughput line for simultaneous production of hybridization scores on hundreds of thousands of 1- to 2-kb clones. DNA sample preparation and dense offprinting on filters, hybridization, and imaging are highly parallelized and streamlined for easy automation. A throughput capacity of 1 million scores/d is projected for the next year, increasing to 10 million/d in the near future.
Three levels of sequencing information can be obtained depending on the numbers of probes scored per clone in an experiment. Mapping and identification using clone sequence smgnatures can be achieved with relatively few (50 to 200) probes. Positioning and identifying genome structural elements (partial sequencing) requires more-extensive hybridizations. Complete sequencing by SBH requires data from several thousand probes, either on three to five related genomes or, in the case of single genomes, in combination with single-pass gel sequencing of one genome equivalent.
In an orderly progression toward complete sequencing, we have almost completed the development of SBH for the first group of applications. Typing of 20,000 cDNA clones from human brain with 60 to 110 probes led to grouping them into at least 5000 gene clusters, revealing the abundance structure of the libraries used. A model experiment on known clones simulated the cosmid-sized DNA subclone library of ten equivalents. This experiment demonstrated that SBH data from 110 probes can lead to clone clustering so that the entire DNA is represented in a one- to two-equivalent set of clones drawn from the clusters. This can reduce the redundancy of gel sequencing by five- to tenfold. The principle of partial sequencing was demonstrated by identifying the gamma-actin cDNA cluster only on the basis of its hybridization scores.
Intermediate-term goals are to (1) prepare sequence-ready maps of 1-
to 2-kb subclones of human cosmids or bacterial artificial chromosomes
and of several related bacterial genomes; (2) identify partially sequenced
cDNAs in previously sequenced libraries to avoid redundancy in gene discovery
and efficiently provide cDNAs from as-yet-unknown genes for complete sequencing;
and (3) starting from the above maps, combine hybridization data from 3000
probes and single-pass gel sequencing to obtain very accurate finished
sequence at a scale of 5 to 20 Mb/year.
Since 1987 when we conceived sequencing by hybridization (SBH), we have developed several procedures and concepts that enable immediate use of the method as well as future "chip"-based technologies. In particular, hybridization conditions were defined and proved by correct sequencing of 343 bp in a blind test. Solutions for inexpensive, large-scale genome analysis with state-of-the-art technologies are represented by (1) partial sequencing or fine structural (and sequence-ready) mapping with 100 to 1000 probes and (2) full sequencing by integrating the incomplete gel and SBH data from single or several similar genomes. A basis for genome sequencing without subcloning is provided by Format 1 (an array of DNA samples) or Format 2 (an array of probes) sequencing chips based on microbeads, and by a recently proposed combination of the two formats. Format 3 (combinatorial chip) involves ligation of arrayed probes and probes in solution.
To implement Format 1, we have developed a data-production line with the present capacity of 1 million clone-probe measurements/d. A high-throughput polymerase chain reaction (PCR) procedure is established using BioOvens. Biomek 1000 is adapted to spot 31,000 DNA samples on a 6- by 9-in. filter. This dot density provides 50 Mb of DNA per membrane, ready for fine mapping and sequencing. Development of a hybridization machine with a capacity of 24 filters is in progress. The PhosphorImager is used to collect data from 33P-labeled probes and our image-analysis program to report dot intensities. Priorities for upgrading current facilities toward a capacity of 10 million scores/d are an automated setting of 10,000 PCR reactions/d, labeling of 100 probes/d, and robotized retrieval of selected subsets of clones.
By the described setup, 20,000 cDNA clones from a brain library (M. B. Soares, Columbia University) have been hybridized with 256 probes. About 13,000 groups or single clones have been recognized by our clustering program. Screening provides a rational choice of clones for gene mapping and full sequencing. The method's simplicity allows inexpensive screening of millions of cDNAs from dozens of tissues. Our first target is 100,000 clones from the brain library. To demonstrate sequence-ready mapping (ordering of shotgun clones), 1100 M13 subclones from a cosmid (B. Koop, University of Victoria, Canada) have been hybridized with 250 probes, and screening a shotgun library of the 2-Mb genome of the archebacteria Pyrococcus furiosus (F. Robb, University of Maryland, Baltimore) has been started.
The next target is a proof of the proposed inexpensive sequencing scheme,
which requires 3000 probes and targeted single-pass gel sequences with
as much as 20% errors. A further advancement would be comparative sequencing
of 4 similar bacterial genomes. Megabase sequencing based on reading 14-mers
through ligation of back-to-back hybridized 7-mers will be investigated