TRANSCRIPTOME 2002: From Functional Genomics to Systems Biology
March 10-13, 2002
Seattle, Washington, USA

I.M.A.G.E. Data Mining Tools:  Introducing the IQ

Peg Folta, Lawrence Livermore National Laboratory, CA

The I.M.A.G.E. Consortium has maintained the largest cDNA clone collection in the world since its creation in 1993.  U.S. and European distributors provide the current collection of over 5.5 million clones, from six species worldwide.  The I.M.A.G.E. collection has been the basis of several key genomic projects, such as the current NIH Mammalian Gene Collection (MGC), The NIH Cancer Genome Anatomy Project (C-GAP), and the Merck Gene Index projects.  64% of the human ESTs in GenBank have been obtained from I.M.A.G.E. clones.  Based on the collections impact to the industry, one of our major focuses has been to quantify and track the make-up of the collection and provide public access to all data with intelligent data mining techniques.

IMAGEne is a mature clustering software product that provides java-based query and display of each of its gene clusters.  Clustering is based primarially on sequence overlaps and clone membership.  The number of gene clusters with full-lenth clone representatives has risen significantly over the past year, as a direct result of the MGC project.  The number of clusters without an I.M.A.G.E. clone has decreased, due to improved library creation techniques.  Detailed statistics on the current build will be presented, as will trends in the collection over the last few years.

I.M.A.G.E. tracks all information associated with the clones within the collection, from the originating cDNA library data through the resulting EST and full insert sequence in an Oracle database. A new intelligent query tool, IQ, has been developed and released to the public for mining of this data.  The IQ tool allows the user to specify the attributes used to define the query and the content, format, and destination of the results.

Problem clones identified by I.M.A.G.E. consortium members, the distributors, and users are also tracked in the database.  These database tables and the associated web-based forms have been enhanced this year for ease of entry and query.  Reported problems are now used directly in IMAGEne clustering and have effected the singleton analysis results.

This work was partially funded by the NIH and was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under contract no. W-7405-Eng-48.

