9th Annual Workshop, October 28-31, 1999
Co-sponsored by the U.S. Department of Energy
Promoter prediction in large genomic sequences: compromise between sensitivity and specificity
Scherf, M., Klingenhoff, A., and Werner, T.
Institute for Mammalian Genetics, GSF-National Research Center for Environment and Health, Neuherberg, Germany
The availability of large anonymous DNA sequences produced by the current genome sequencing projects often runs far ahead of functional analyses and cDNA projects. This necessitates an initial analysis that is based solely on computer prediction of genes and other units within the sequences. Both the identification of coding exons and regulatory sequences (e.g. promoters) face very similar problems connected with false positive (specificity) as well as false negative (sensitivity) predictions. A new method for the detection of polymerase II core promoters developed in our group will be introduced which is very specific but is of limited sensitivity. The advantages and disadvantages of several systems for promoter prediction will be compared on several examples. During this study it became clear that strategies mixing existing knowledge and experimental results with in silico predictions are most promising in the quest for promoter annotation of anonymous sequences. This finding parallels results already known from exon predictions. There is also apparently an inevitable trade-off between the specificity and the sensitivity of in silico predictions which cannot be completely overcome by combining different methods. It became very clear that individual methods have different defined ranges of sequence length where they are most effective.