9th Annual Workshop, October 28-31, 1999
Co-sponsored by the U.S. Department of Energy
Development of non-redundant arrays of full-length cDNAs
Maria F. Bonaldo1, Brian Berger1, Todd Scheetz1, 2, Kyle Munn2, Chad Roberts2, Kang Liu2, Thomas Casavant2 and Marcelo Bento Soares1, 3
Departments of 1Pediatrics, 2Electrical and Computer Engineering, and 3Physiology and Biophysics, University of Iowa, Iowa City, Iowa, USA
Large-scale programs are soon going to be initiated aimed at the generation of complete and accurate sequence of large numbers of full-length cDNAs. It is anticipated that the resulting information will not only expedite the identification of human disease-causing genes but also will facilitate the process of annotation of the genomic sequences currently being generated. Central to this effort, however, is the existence of non- redundant sets of full-length cDNAs. Although several methods have been developed for construction of libraries enriched for full-length cDNAs, there aren't any publicly available arrayed sets of full-length cDNA clones. That will require en masse identification of full-length clones in libraries enriched for full-length cDNAs, clustering for identification of a non-redundant set of cDNAs, and their rearraying into 96 or 384 well plates. Towards this goal, we have generated and thoroughly characterized four cDNA libraries enriched for full-length cDNAs that we constructed from size fractionated human germinal center B cell cytoplasmic mRNA. Our strategy to generate non- redundant arrayed collections of full-length cDNAs involves the following steps:
(1) Generation of 5'ESTs from a large number of clones,
(2) Clustering for identification (and sequence assembly) of a non-redundant collection of cDNAs/ESTs,
(3) Blast analysis for identification of cDNAs corresponding to mRNAs for which sequence information is not yet available in Genbank,
(4) Informatics analysis with a "5' classifier program" for identification of 5'ESTs likely to encompass the start codon,
(5) re-arraying of a non- redundant set of full-length cDNAs.
To date we have generated in excess of 13,000 5'ESTs and we have started to build a non-redundant collection of full-length cDNAs.