Wei Yu and Richard A. Gibbs
Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
Baylor College of Medicine Human Genome Sequencing Center
The Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) is scaling with the aim of completing 15 Mb of human genomic DNA in the period until April 1998, using a modified shotgun strategy. In order to make this process more efficient, innumerable aspects of the shotgun sequencing process have been optimized and streamlined, providing a good test of this sequencing approach for analyzing large fragments (>50 kb). We routinely now complete genomic fragments using less than 20 reads per kilobase of DNA.
In order to efficiently sequence short (1 - 5 kb) DNA fragments we have devised another shotgun based strategy entitled Concatenation cDNA Sequencing (CCS). CCS has been applied to the joining of multiple independent cDNA molecules to form long concatemers, and then generation of shotgun libraries. The libraries are then sequenced in a similar manner to cosmid, BAC or PAC libraries and the sequences of the cDNA molecules are resolved as individual contigs in the final computer assemblies.
More than 500 cDNA full insert sequences from Soares libraires have now been initiated in libraries averaging 50 inserts. More than 150 cDNAs have been finished and submitted to GenBank, and 250 are in closure. The complete sequences show that the method is as efficient as when sequencing comparable clones with the same total length as the combined cDNAs, and libraries have been competed with as few as 14 reads/kb. Simplified methods for pooling cDNA preparations have been developed, and problems associated with individual cDNAs that evade concatenation have been solved. Our aim is to complete 1,000 cDNAs insert sequences within 1998.
Trey Ideker, Richard Karp, and Leroy Hood
University of Washington
Department of Molecular Biotechnology
Health Sciences Building Box 357730
Seattle, WA, 98195
From the perspective of the data analyst, current DNA array technology is in the early stages of development and its data hard to reproduce across multiple experiments. A comprehensive array system for the determination and interpretation of gene transcription rates is under development with a focus on obtaining well-characterized data for transcriptional network analysis. The acquisition scheme includes a robot with a custom-designed modular print head which deposits cDNA onto glass slides and a commercial fluorescent imaging machine which detects array hybridizations. Software to locate and quantitate samples in the image has also been developed. The software generates a list of expression levels for each gene and produces an estimate of the fluorescence background local to each sample spot on the array. We have statistically characterized the performance of the array process so that each measured expression level also has an associated confidence metric. This metric reflects the measurement error in the expression levels of each gene and includes the deviation in identical experiments performed several times. Variation is due to error in the array robotics, DNA hybridization and attachment chemistry, mRNA sample preparation, fluorescence detection, and naturally occurring expression differences between separate RNA samples. A series of known test samples was deposited and analyzed using the array system in order to characterize each area of variation. Future work will focus on an in-depth analysis of the complex expression data. This analysis includes formulation of a general computer model of gene transcription and implementation of an algorithm to predict biochemical pathways from transcription rate data.