9th Annual Workshop, October 28-31, 1999
Co-sponsored by the U.S. Department of Energy
Structural and evolutionary analysis of eukaryotic mRNA untranslated regions
Pesole G.1,4, Liuni S.2,4, Larizza A.3, Makalowski W.5 and Saccone C.3,4
1 Dipartimento di Fisiologia e Biochimica Generali, Università
di Milano, Milano, Italy
2 Centro di Studio sui Mitocondri e Metabolismo Energetico, C.N.R., Bari, Italy
3 Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Bari, Italy
4 Area di Ricerca del CNR, Bari, Italy
5 National Center for Biotechnology Information, NLM-NIH, Bethesda, Maryland, USA
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translation efficiency. In order to study the general structural and compositional features of these sequences we have previously developed UTRdb, a specialized database of 5' and 3' UTR sequences of eukaryotic mRNAs cleaned from redundancy (Pesole et al., 1999).
Utrdb (release 10.0) contains 75,448 entries (26,145,985 nucleotides) which are also annotated for the presence of functional sequence patterns whose biological activity has been experimentally demonstrated. All these patterns have been collected in the UTRsite database where for each functional pattern, corresponding to a specific entry, the consensus structure is reported with a short description of its biological activity and the relevant bibliography. Furthermore, UTRdb entries have been annotated for the presence of repeated elements present in the Repbase database (Jurka, 1998). A total of 5,818 functional elements and 54,975 repetitive elements are annotated in UTRdb. All Web resources we implemented for the retrieval and the analysis of UTR sequences are available at the UTR home page (Pesole and Liuni, 1999) we recently implemented. UTRdb entries can be retrieved through the SRS system where crosslinks to UTRsite as well as to the nucleotide or aminoacid primary database are also established. Through the Web facility UTRscan any input sequence can be searched for the presence of a functional pattern annotated in UTRsite and UTRfasta allows to assess sequence sililarity between a query sequence and UTRdb entries.
The analysis of complete UTR sequences contained in this database showed that 5'-UTR sequences, on the average 187 nucleotides long, were 1,2 to 4,3 times shorter than the corresponding 3'-UTR sequences in the various taxonomic groups considered. As far as the compositional properties were concerned, on average 5'-UTR sequences resulted in all cases GC richer than 3'-UTR sequences and significant correlation was found between the GC content of 5' and 3'-UTR sequences and the GC content of the third silent codon positions of the corresponding protein coding genes (Pesole et al., 1997). Some structural features of 5'UTRs were investigated, such as presence of upstream ORFs and context of initiator ATG, which are known to affect the mRNA translation efficiency. In order to assess the level of functional constraint of UTR sequences we have studied their evolutionary dynamics also in comparison with the corresponding coding regions. With suitable evolutionary models we have calculated the nucleotide substitution rate of 5'-UTR, 3'-UTR, synonymous and asynonymous positions by comparing complete human, murid (rat and mouse) and artiodactyl mRNAs, for which a suitable number of orthologous sequences was available.
This work was partially financed by EC grant ERB-BIO4-CT96-0030.