Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, Nov. 1994; 6(4):16
As part of the annual Cold Spring Harbor Laboratory (CSHL) Genome Mapping and Sequencing meeting in May, David States (Washington University, St. Louis) convened a workshop on modular software development and data interchange in genome research.
Molecular biologists routinely exchange protocols and reagents, but difficulties in technology exchange have led to duplication of effort in genome informatics. The informatics workshop focused on ways to improve the modularity of software supporting genome mapping and sequencing and to enhance reliable data communication among centers and other researchers.
Workshop speakers were Nat Goodman (Whitehead Institute for Biomedical Research); Jean Thierry-Mieg (CNRS, Montpellier, France); James Ostell (National Center for Biotechnology Information); Tom Slezak (Lawrence Livermore National Laboratory); Ken Fasman [Genome Data Base (GDB)]; and States, who served as moderator.
In presenting a modular view of software requirements for large-scale sequencing laboratories, States defined independent roles for sequence base calling, shotgun map assembly, consensus sequence generation, multiple sequence alignment, editing, and data storage. He urged that developers concentrate on specific tasks rather than build an entire system to test a novel idea that impacts only one activity.
Goodman outlined database technology applicable to genome sequencing and pointed out that a critical issue in modular software development is agreement on information content at data interfaces. Once content agreement has been achieved, technical issues of data interchange can be addressed. In particular, Ostel and Thierry-Mieg concurred that translators between ASN.1 and .ace data formats could be written easily and that work in this area should be encouraged.
Slezak pointed out that centers' different biological approaches have led to a variety of map representations. Fasman added that implementation of various anonymous ftp servers, Gopher sites, and World Wide Web pages has resulted in very heterogeneous collections of information resources. In addition, sites may update their own servers on variable and unreliable schedules and have different data formats, definitions, and semantics, all of which add to difficulties in using and maintaining up-to-date data collections. Fasman reminded the group that centralized data repositories such as Genome Data Base were developed to address these issues and need community support to implement solutions.
States and Goodman are planning a similar workshop for the next CSHL mapping and sequencing meeting. Ideas and suggestions should be sent to States (314/362-2135, Fax: -0234, Internet: firstname.lastname@example.org) or Goodman (617/252-1904, Fax: -1902, Internet: email@example.com).
A mailing list and discussion group has been set up on WWW (URL: http://www.broad.mit.edu/informatics/sharing_archive/). To join the discussion group, firstname.lastname@example.org; to send mail to the group, email@example.com).]
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v6n4).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.