HGPI

Human Genome Project Information Archive
1990–2003

Archive Site Provided for Historical Purposes


Sponsored by the U.S. Department of Energy Human Genome Program

Human Genome News Archive Edition
go to list of issues »
Vol.9, No.3   July 1998

In this issue... 

Also available in pdf.

1997 Santa Fe Highlights

Human Genome Project Administration

In the News

Publications

Software and the Internet

Funding

Meeting Calendars & Acronyms

  • Genome and Biotechnology Meetings 
  • Training Courses and Workshops 
  • Acronyms

HGN archives and subscriptions
HGP Information home

Data Surge Challenges Informatics Developers

The explosive growth of sequence and biological information poses pressing challenges for data acquisition, representation, access, and analysis. Some highlights from informatics sessions at the Santa Fe workshop follow.

bioWidgets: Adaptable, Reusable Modules for Viewing Data
Many software analysis applications commonly are tailored to fit resources available at a particular site. The bioWidgets toolkit philosophy of Chris Overton's team [University of Pennsylvania (Penn)], however, is to use a component-based approach to design adaptable and reusable software, easily incorporated in a variety of applications and deployable in modules, that promotes interaction among applications. Jonathan Crabtree described the team's efforts to develop and deploy graphical user interfaces for visualizing molecular, cellular, and genomic information. The current implementation includes widgets that display sequences, maps, BLAST results, chromosomes, and sequence alignments. The group also is developing interfaces for data stored in such distributed heterogenous databases as the Genome Database, Genome Sequence DataBase, Entrez, and ACeDB and is creating a consortium of bioWidget developers and users to create standards. All bioWidgets are implemented in Java for Web distribution.

Querying Across Databases with BioKleisli
Sue Davidson (Penn) described a new suite of tools that permits researchers to pose complex questions over the distributed, heterogenous sources housing most genome-related data. Answering the query, "Find human sequence entries on human chromosome 22 overlapping q12," for example, would now require access to three separate databases. The new system, which performs integration "on the fly" while allowing simultaneous structural source-data transformations, is based on the powerful Kleisli integration system developed at Penn. Together with the high-level Collection Programming Language (called CPL), bioKleisli can be used to integrate data through dynamic user-defined views or to create specialized data warehouses allowing fast access (http://www.pcbi.upenn.edu).

Improved BCM Search Launcher
Kim Worley [Baylor College of Medicine (BCM)] reported on the enhanced sequence-analysis search services provided by the BCM Search Launcher. Search Launcher is an easy-to-use interface that organizes Web sequence-analysis servers according to function and provides a single point of entry for related searches. It adds hypertext links for easy access to Medline abstracts, related sequences, and other information. A BLAST Enhanced Alignment Utility (BEAUTY) tool makes it easier to identify weak but functionally significant matches in BLAST protein database searches. Recent enhancements make BEAUTY searches available for DNA queries (BEAUTY-X) and for gapped alignment searches (using WU-BLAST2). For users who need to perform a particular search on a number of sequences at once, the Batch Client provides access to all searches available from the BCM Search Launcher Web pages in a convenient drag-and-drop (Macintosh) or command line (UNIX, PC) interface. Future developments are focusing on the analysis of large-scale sequences to support the efforts of the Genome Annotation Consortium (see sidebar above).

WIT/WIT2: Reconstructing Metabolism Analysis of the increasing number of fully or partially sequenced small genomes can serve as the foundation from which to look at more complex genomes. Evgeni Selkov and Ross Overbeek (both at Argonne National Laboratory) discussed the reconstruction of accurate metabolism models for 29 of these small organisms. Using sequence data supplemented with biochemical and phenotypic data, the group has made reconstructions (some based on still-incomplete sequence data) available via the WIT/WIT2 system. WIT2 is a UNIX-based system in two parts: a Web-based, data-access system and a set of batch tools offering extensible data-query access (http://wit.mcs.anl.gov/WIT2/wit.html).

WIT/WIT2 reconstructions are based on the metabolic pathway (MPW) collection, which includes over 2800 diagrams covering primary and secondary metabolism, membrane transport, signal-transduction pathways, intracellular traffic, transcription, and translation. Selkov observed that identifying universal metabolic aspects and gene families will lead to integrated understanding of metabolic evolution and to technologies for developing higher-level functional models. In the current public release of MPW (http://wit.mcs.anl.gov/MPW/), the coding, based on the pathways' logical structure, is represented by objects commonly used in electronic circuit design. Such design facilitates diagram drawing and editing and enables automation of basic simulation operations.


Return to Top of Page

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v9n3).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.