The Sequence Tagged Connector (STC) approach to genomic sequencing: Accelerating the complete sequencing of the human genome
Gregory G. Mahairas, Keith D. Zackrone, Stephanie Tipton, Sarah Schmidt, Alan Blanchard, Anne West, and Leroy Hood.
The STC approach has been proposed as an attractive strategy to provide a sequence ready scaffolding for the efficient and directed sequencing of the complete human genome (1). This effort has been undertaken through a collaborative effort between the California Institute of technology, TIGR and the University of Washington, and funded through the U. S. Department of Energy. The approach entails the sequencing of the ends of 300,000 Bacterial Artificial Chromosomes (BACs) that constitute a 20X deep Human DNA library to construct a sequence ready scaffold of the human genome.
At the Univ. of Washington we have assembled a high throughput automated end sequencing and fingerprinting process with its associated informatics. BAC clones are robotically inoculated from 384 well plates into 4 ml 96 well culture format, grown and the BAC DNA robotically extracted using AutoGen 740 robots. BAC template DNA from the AutoGen is then robotically transferred into 96 well microtiter plates from which DNA sequencing and fingerprinting reactions are setup. DNA fingerprinting is performed using conventional agarose electrophoresis, digestion with a single restriction enzyme (EcoRV) followed by automated imaging and analysis. DNA sequencing is performed using PE-ABD High Sensitivity dye primers and ABI 377 DNA sequencers. Laboratory protocols, automated data production, data processing, quality control measures and LIMS will be described in detail. During a 50 day period the STC laboratory sequenced 23317 BAC ends (STCs). 19224 (82.4%) where greater than 100 bp non trimmed and the average nontrimmed read length was 388 bp fro a total of 7.46 Mb (.25%) of the genome. 29 % of the STCs contained repetitive DNA but less than 11% where entirely repeat. 12% of the repetitive DNA were LINE sequence, 4.6% LTR, 6.7% SINE sequence and 1.3 % of the STCs contain a microsatellite or simple sequence repeat. The total G + C content was 40 % and the average CpG content was .28, both expected numbers for human genomic DNA. 224 STCs had CpG scores of 1 representing CpG islands. 3103 STCs (16.8%) hit the EST, non-redundant nucleotide or Sixframe database. 1103 STCs hit the EST database (DB) 517 of which hit only the EST, 1087 STCs hit the nr nucleotide DB, 471 of which hit the nr nucleotide DB only and 913 STCs hit the nr protein DB 500 of which hit only the nr protein DB. 181 STCs (1%) hit all three databases, 131 hit nr nuc. and nr protein DBs, 101 hit the EST and nr prot. DBs and 304 hit EST and nr nucleotide DBs i.e. 4% hit more than one of these Dbs and probably represent genes.
1. Venter, J. C., Smith, H. O., and Hood, L. (1996) Nature 381: 364-366
Last modified: Wednesday, October 22, 2003