Future Plans for Databases

Integrating sequence data with mapping, structural, and other biological information will require the development of "virtual databases," with components originating at multiple sites. Some informatics scientists envision creating such a virtual system on top of interlocking community databases that form a loosely coupled information infrastructure. Other research efforts are aimed at developing a local collection of primary databases that act as a single virtual database. Everyone agrees that tools should allow users with only minimal computer knowledge to access many resources and that linkages among primary databases should provide a "one-stop-shopping" capability that eliminates the need for separate queries to each database.

An example of such a virtual database may be found at the WWW site maintained by the Baylor Human Genome Center. Choosing the "Biologist's Control Panel" from Baylor's home page produces a list of more than 20 molecular biology databases. Selecting "BLAST search of GenBank" connects users to NCBI's service in Bethesda, Maryland. Similar hyperlinks are made to GDB, SWISS-PROT, and many other sites.

NCBI's future plans focus on improving GenBank data quality and continuing to provide easy-to-use yet powerful methods for data access. To accomplish this goal, a new GenBank fellowship program has recruited five molecular biologists, who are working with informatics specialists on specific tasks.

GSDB is focusing on online database access by offsite users and direct client-server updates, especially for genome centers. GSDB and EBI also favor establishing connectivity for a federation of interoperable molecular biology databases communicating across computer networks. Because this plan may not require the same level of data standardization as monolithic approaches, it would allow greater autonomy for participating databases. To facilitate such communication, software packages such as EMBL-Search and SRS use cross references built into EBI-distributed databases to allow retrieval of related data.

GSDB anticipates that, although databases are now providing rapid processing of batch submissions, these methods will require too much manual effort to meet the expected future data flow. Capitalizing on the rapid expansion of network availability, GSDB systems are being redesigned to be interactive and network based. To keep up with raw sequence data and the rapidly expanding understanding of sequence function, maintaining database entries will not be limited to original authors but, through support for third-party annotations, will be open to anyone interested. Much future database work may be performed online by the scientific community using client-server tools.

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.