Expression Profiler: An Integrated Tool for Gene Expression and Sequence Analysis

Jaak Vilo, Alvis Brazma
European Bioinformatics Institute EBI
EMBL Outstation
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom
telephone: +44 1223 494633
fax:
email: vilo@ebi.ac.uk
prestype: Poster
presenter: Jaak Vilo

Jaak Vilo, Alvis Brazma, Alan Robinson
European Bioinformatics Institute EBI

Analysis suites for the analysis of vast amounts of microarray gene expression data are becoming more popular as a part of many initiatives of public microarray expression databases. The integration of the databases and analysis and visualization tools will become the neccessity for all major public databases.

We are developing a set of Internet tools called collectively Expression Profiler (see http://www.ebi.ac.uk/microarray/) that will allow users to browse and query microarray data stored in a microarray database ArrayExpress at EBI as well as from other databases on the web [1]. The main challenge is the integration of different types of data and presentation of these data in useful form for biologists to perform the analysis and study the complex relationships in the data. Expression Profiler consists of four major components.

EPCLUST, the expression profile clustering and analysis tool allows users to perform cluster analysis and visualization of expression data. Main methods for analysis include the hierarchical and $K$-means clustering with different distance measures and clustering parameters (choice of hierarchical clustering method, choice of starting points for K-means etc.). Visualization of expression profiles is based on methods developed by Mike Eisen [2].

The cluster-analysis of gene expression data is only the first step of many. GENOMES, the tool for retrieving the gene annotations and sequences (e.g. upstream regulatory sequences), fulfills the need for species-specific databases to be able to handle queries for sets of genes simultaneously, as users would like to have "executive summaries" about the clusters they have discovered from the expression data.

Due to large numbers of tools and analysis methods on-line, we have developed a "middle layer", an URLMAP tool, which allows to send data about cluster contents to various other databases, as for example to ask "in which pathways are the genes from my cluster participating in?". Ideally each on-line database should provide mechanisms to pre-fill-in the contents of cluster contents to their tools.

The discovery of putative transcription factor binding sites [3] is an application of microarray expression data. We have developed a pattern discovery tool SPEXS that is able to perform a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster in respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster in comparison to the background distribution.

These four tools form currently the core of the Expression Profiler analysis suite. The tools can be used in the manner facilitating an automatic discovery of potential regulatory signals in genomes as described in [3].

Integration of data from high-throughput microarray experiments with other types of data, for example the genomic sequences, protein-protein interactions, metabolic pathways and signaling data will open new ways for genomic-scale analysis methods. We are exploring the development of new techniques as well as technology transfer from marketing and telecommunications domains, e.g. application of visualisation, data mining and statistical analysis. Technologies such as CORBA and XML provide ways to interact with other tools and databases over the Internet. This will allow external users to use the data stored at EBI databases or integrate the analysis tools developed at EBI with their tools.

References

[1] A. Brazma, A. Robinson, G. Cameron and M. Ashburner One stop shop for microarray data. Nature 2000 vol. 403 (pp. 699-700)

[2] M. Eisen, P. T. Spellman, D. Botstein and P. O. Brown Cluster Analysis and Display of Genome-wide Expression Patterns. PNAS 1998 volume 95 (pp. 14863--14867)

[3] J. Vilo, A. Brazma, I. Jonassen, A. Robinson, E. Ukkonen Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data. ISMB-2000 August 2000. AAAI press. (pp. 384--394)



  Abstract List


Abstracts * Speakers * Organizers * Home


Genetic Meetings