JITF Discusses Map and Sequence Databases

The second Joint DOE-NIH Informatics Task Force (JITF) meeting, November 30-December 1, 1990, addressed the Human Genome Project need for public databases containing mapping and sequencing data. Discussions were organized around presentations on existing map and sequence databases by representatives from data-housing facilities. Several presentations focused on the evolution, current status, and future development plans of data repositories.

Other speakers were Scott Tingey (Du Pont) and Brian Hauge (Massachusetts General Hospital). Tingey discussed the Molecular Breeding Program (plants) and offered some solutions to the problems of storing vast quantities of laboratory data in accessible forms. Hauge described the progress and database requirements of the Arabidopsis genome mapping program.

Guidelines Established

The task force concurred on the following guidelines for establishing genome data resources:

  1. Mapping databases are most naturally organized as organism-specific consensus map databases containing all genetic and physical mapping data that are significantly useful to the biomedical community.
  2. Centralized consensus databases should provide direct or indirect access to the supporting data.
  3. Central databases and project-supporting databases should be implemented using software and hardware systems that adhere to industry standards. Currently, these are the commercial relational database management systems using client-server architecture; they run on Posix-compliant computers connected to the research Internet and are capable of supporting communication with the Transmission Control Protocol/Internet suite of protocols (TCP/IP).
  4. Public-use databases must provide a stable, documented Application Program Interface, so that third parties may develop interface software to the data. Public-use databases should use a standard system for representing typographic information (e.g., italics and superscripts) where it has important scientific meaning. Standard Generalized Markup Language is one such standard.
  5. Public-use databases must be designed to support differential data accessibility among authorized users.
  6. Data suppliers should be encouraged to estimate confidence limits of data or consensus elements, and these limits should be represented in the database.
  7. The databases should maintain a history of database changes, such as an audit trail or set of editorial citations.

JITF Working Groups

The four JITF working groups made brief reports on their work. The action items that follow were included in the reports or resulted from them.

Data Requirements Working Group

The group will concentrate on mapping data; Lipman recommended establishing connections with model organism mapping data projects. Branscomb was appointed liaison to the Human Genome Organisation committee on physical mapping data.

Connectivity and Infrastructure Working Group

The group's aim was reported to be fostering capability and not mandating actions; it recognized that the Internet TCP/IP protocol suite is the U.S. connectivity standard. The working group recommended that all genome centers and genome data resources be Internet accessible and that the funding agencies provide connection guidance and support. The group pointed out that network resource availability would create a second cycle of demand from individual researchers; NIH and DOE should expect this demand to increase and be prepared for it.

Training Working Group

Although DOE, NIH, and the National Science Foundation (NSF) have separate genome and computation fellowships, the working group stressed that the Human Genome Project has an opportunity to make a real impact on interdisciplinary computation and biology training by designing a fellowship that would be available at a number of levels: predoctoral, postdoctoral, and mid-career. Another short-term goal of the working group is the development of a summer course in genome informatics for investigators whose primary training is in biology.

Long-Term Needs Working Group

The need for analytical tools and genome informatics training was discussed. The group noted that NSF has taken the lead in biocomputing training. Frank Olken (Lawrence Berkeley Laboratory) pointed out that the Human Genome Project should support basic research in database theory, because advances in database theory and practice are necessary to achieve project goals. Lipman suggested that the cost for such research would be less than that for one genome center, and that Human Genome Project administrators need to be aware of current research and to encourage computer scientists to meet the project's database requirements.

Prior to its next meeting (tentatively scheduled for March 14-15), JITF plans to organize a workshop on laboratory support databases and associated software to develop a requirement specification for a general laboratory support tool.

Reported by David Benton, NIH NCHGR and Robert Robbins, NSF/DOE

David Benton is Assistant to the Director for Scientific Data Management at NCHGR.

Robert Robbins is Program Director for Database Activities in the Biological, Behavioral, and Social Sciences at NSF; he is assisting DOE in genome informatics and computational activities through the courtesy of NSF.

Database Presentations

  • Genome Data Base
    Welch Medical Library
    Johns Hopkins University
    Peter Pearson and Richard Lucier
  • Lawrence Livermore National Laboratory
    Human Genome Center
    Elbert Branscomb
  • Los Alamos National Laboratory (LANL)
    Center for Human Genome Studies
    James Fickett
  • GenInfo Backbone Sequence Database
    National Center for Biotechnology Information
    David Lipman
  • "Electronic Data Publishing" Model
    Paul Gilna and Michael Cinkosky
  • Priority Area Research Project on Genome Informatics
    Japanese Human Genome Project
    Minoru Kanehisa
    (Kyoto University)
  • European Molecular Biology Laboratory
    Data Library
    Graham Cameron

