9th Annual Workshop, October 28-31, 1999
Co-sponsored by the U.S. Department of Energy
Search services to improve the identification of expressed sequences and their functions
J. Bouck1, M. McLeod1, T. McNeill2, G. Weinstock1,3, R. A. Gibbs1, and K. C. Worley1
1Department of Molecular and Human Genetics, Baylor College of Medicine
2Department of Biochemistry, University of Houston
3Department of Microbiology and Molecular Genetics, University of Texas - Houston Medical School, Houston, Texas, USA
Two of our approaches taken to identify and characterize transcribed sequences will be presented here. Existing transcript data that is publicly available is often not accessible due to the form that it is stored in. One such set of data is the inconsistently annotated collection of cDNA sequences in the GenBank database. The Human Transcript Database isolates these sequences and improves the access to them. Identifying the putative function of an expressed sequence may not be a straightforward process, regardless of the degree of sequence similarity identified in a sequence similarity search. Our second tool, BEAUTY, provides a collection of information about functional annotation in combination with sequence similarity to aid in functional assessment.
Improving access to cDNA sequences by isolating existing transcribed sequences from GenBank. Transcript sequences that are full-length mRNAs or cDNAs are available in the GenBank NR DNA database, but these sequences may not be readily accessible in that database due to the volume of other sequences. The Human Transcript Database (HTDB) is a curated collection of expressed sequences isolated from GenBank. The HTDB is a valuable resource for studying human genes and targeting cDNA sequencing projects. This collection can be searched using keywords or sequence similarity from the web site at http://hgsc.bcm.tmc.edu/HTDB.
Identifying the function of expressed sequences is the focus of the BEAUTY programs. BEAUTY presents information from the Annotated Domains database, a collection of protein sequence annotations from Prosite, Blocks, Prints, Entrez, and Pfam, in combination with reports of local BLAST similarities. This combination provides a better assessment of functional domain conservation than BLAST searches alone. BEAUTY searches are available for protein and DNA queries and searches of protein databases from the BCM Search Launcher http://searchlauncher.bcm.tmc.edu/.