Beyond the Identification of Transcribed Sequences: Functional and Expression Analysis

9th Annual Workshop, October 28-31, 1999

Co-sponsored by the U.S. Department of Energy


Development of non-redundant arrays of full-length cDNAs

Maria F. Bonaldo1, Brian Berger1, Todd Scheetz1, 2, Kyle Munn2, Chad Roberts2, Kang Liu2, Thomas Casavant2 and Marcelo Bento Soares1, 3

Departments of 1Pediatrics, 2Electrical and Computer Engineering, and 3Physiology and Biophysics, University of Iowa, Iowa City, Iowa, USA

Large-scale programs are soon going to be initiated aimed at the generation of complete and accurate sequence of large numbers of full-length cDNAs. It is anticipated that the resulting information will not only expedite the identification of human disease-causing genes but also will facilitate the process of annotation of the genomic sequences currently being generated. Central to this effort, however, is the existence of non- redundant sets of full-length cDNAs. Although several methods have been developed for construction of libraries enriched for full-length cDNAs, there aren't any publicly available arrayed sets of full-length cDNA clones. That will require en masse identification of full-length clones in libraries enriched for full-length cDNAs, clustering for identification of a non-redundant set of cDNAs, and their rearraying into 96 or 384 well plates. Towards this goal, we have generated and thoroughly characterized four cDNA libraries enriched for full-length cDNAs that we constructed from size fractionated human germinal center B cell cytoplasmic mRNA. Our strategy to generate non- redundant arrayed collections of full-length cDNAs involves the following steps:

(1) Generation of 5'ESTs from a large number of clones,

(2) Clustering for identification (and sequence assembly) of a non-redundant collection of cDNAs/ESTs,

(3) Blast analysis for identification of cDNAs corresponding to mRNAs for which sequence information is not yet available in Genbank,

(4) Informatics analysis with a "5' classifier program" for identification of 5'ESTs likely to encompass the start codon,

(5) re-arraying of a non- redundant set of full-length cDNAs.

To date we have generated in excess of 13,000 5'ESTs and we have started to build a non-redundant collection of full-length cDNAs.

Return to Table of Contents