Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, November 1993: 5(4)
The DOE Office of Health and Environmental Research has announced that management of sequence data at Los Alamos National Laboratory (LANL) is now operating independently with an expanded mission and a new name: Genome Sequence DataBase (GSDB). GSDB will function both as a research resource for the specific needs of the Human Genome Project and as a service facility.
For more than 10 years, LANL maintained GenBank, an electronic database that serves as the national repository for all nucleotide sequence information, through subcontracts and interagency agreements with the National Institute for General Medical Sciences (NIGMS) and the National Center for Biotechnology Information (NCBI), both units of NIH.
Now, operating as GSDB, LANL researchers will continue to accept new direct data submissions and provide update and annotation services for sequences in their care. They will also extend their work in developing new computer tools to improve the value of genetic sequence databases to the international research community. The name GenBank will now be used exclusively by NCBI to describe the nucleotide sequence database services that NCBI will continue to provide to the scientific community.
GSDB Research and Development Activities
GSDB Service Facility Data-Management Activities
GSDB Relationship to Other Databases
GSDB is committed to productive and complementary interactions with other sequence databases such as those at the DNA Data Bank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and NCBI. DOE has been engaged in discussions regarding future relationships between GSDB and other databases to ensure that GSDB activities complement those at other sites to produce improved services for the user community. Data submitted to GSDB are being made available to other sequence databases as soon as processing is complete. Also, data submitted to other databases are incorporated into GSDB as the data become available.
Interoperable Information Resources
For more than a year, DOE has been carrying out an extensive review of its informatics activities in support of genome and structural biology research. Advice and comments from reviewers and the community have emphasized the need for improved, integrated information resources [see HGN 5(3), 1-4 (September 1993)]. The report stated that interoperability among crucial databases is essential and noted that current databases are unable to answer simple queries requiring integration of map, sequence, and other biological data.
Following advice from the report and elsewhere, DOE determined that it must develop and support an integrated information infrastructure for genome and structural biology research. DOE also resolved that major database elements in the integrated infrastructure should emphasize direct access through networked application programming interfaces and allow direct online data submission, annotation, and curation by the research community.
The database component should be both a research project and a production service supporting ongoing biological research, with the research project undertaking development of better data models and direct online tools for data submission and curation and for federated data access.
In the short term, nucleotide data-resource development supported by DOE will take advantage of the specific expertise, facilities, and capacity developed at LANL during its long tenure as a leading U.S. site for nucleotide database development. Over time, DOE nucleotide data resources will undoubtedly evolve in accordance with the developing integrated infrastructure (a 'center without walls') and will be subjected to extensive peer review and competitive evaluation.
Historical Role of Los Alamos
In the 1970s, Walter Goad established the Los Alamos Sequence Database, a pioneering effort at LANL that in 1982 evolved into the GenBank project. LANL continued to expand and build the database in collaboration with the firm Bolt, Beranek, and Newman under funding provided by NIGMS and other federal agencies. In 1987 LANL continued to be the site of database design and maintenance, working with IntelliGenetics.
In 1992, NIH transferred its management control for the GenBank project from NIGMS to NCBI at the National Library of Medicine. At that time, DOE and NCBI entered into an Inter-Agency Agreement (IAA) so that LANL could provide assistance in processing direct submissions for NCBI. The IAA noted, 'For nine years, LANL has been responsible for the design and management of gene sequence data as part of the GenBank project. . . . In the most recent re-competition, all three proposals which were in the competitive range included LANL as a subcontract for the direct data submission component of the project. Thus, LANL was recognized not only for its past experience in establishing the procedures for collecting and managing biological data, but for its innovative approaches in handling data prior to or independent of the publication process.'
Now, NCBI has developed its own capacity for processing direct submissions, freeing LANL to develop new approaches, tools, and services targeted specifically for the genome community.
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v5n4).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.