Archive Site Provided for Historical Purposes
Dulles Hilton Hotel
April 2-3, 1998
On April 2 and 3, DOE's OBER and NIH's NHGRI convened a workshop to identify informatics needs and goals that could be part of the next genome five-year plan and that would begin to craft a vision for genome informatics over the next five years and beyond. In attendance were 46 invited informatics and genomics experts, and 6 DOE, 8 NHGRI, 2 NIGMS and 1 NSF staffers. The meeting was held at the Dulles Hilton in Herndon, VA.
Priorities:
The breakout groups had been asked to address four sets of issues, and their conclusions on these and some other issues are summarized:
Queries: Users want to be able to ask everything conceivable about sequences, genes, markers, regions, relationships, maps, proteins, functions, interactions, regulatory pathways, variation, phenotypes, and inter-species comparisons. How the data were derived, under what experimental conditions, by whom, the raw data (ABI traces, gel lanes, etc), what methods were used to process the raw data into database entries (e.g. sequence), QA/QC measures — everything! It should be possible to answer all queries that could be supported by the data.
The need for all the underlying data arises especially for individual phenotypic data. Given the expense of phenotyping, it is important to be able to go back and check whether a particular SNP is really there. The ABI traces are not needed for the reference sequence since questionable regions can be sequenced again.
Tools: DNA sequencing has a bottleneck at finishing; tools to speed up this process are a critical need. Others needed are production tools, research tools (for analysis, for visualization, etc.), access tools (for visualizing data objects, for extracting objects from different databases, etc.), annotation tools, data capture tools, functional genomics tools, data mining tools. Development and hardening of tools to promote easier dissemination finishing and exporting, QA/QC of the different tools, tools that are interoperable, map integration tools, and outreach tools. A web site that collects and annotates these tools would be very useful.
Standards: There was strong support for intelligent standards that various constituencies of the genome project, academic, government, and industry, could join in defining and implementing. These include a variety of controlled vocabularies for various objects that would be entered into appropriate databases. Today, industry standards are very distinct from the few that exist (e.g. Phred/phrap for sequence QA/QC) in the HGP. A current group (the OMG, Object Management Group) is composed mostly of industry representatives, but should involve academic and government representatives. Explicit object definitions and access methods are desperately needed. Component-oriented software standards would promote systems integration, interoperability, flexibility and responsiveness to change (e.g. CORBA). It was recognized that there is a balance between having standards and allowing change and flexibility.
Annotation: Automated annotation analyses should be done using clearly defined standard operating procedures, consistent application, and sufficient documentation. Automated annotation is a good place for biologists to start for more detailed understanding of particular chromosome regions. Human participation in the annotation process is still important, however, for getting the most out of genomic information.
Quality checks: There were suggestions that the databases be subject to regular checks of quality. Users are frustrated by incorrect data and the unwillingness or inability of database providers to correct these mistakes. Official editors who curate information could resolve errors and improve the data quality. The success of the quality assessment exercise for sequence centers provided a model for the usefulness of database quality assessments.
Training/Environment issues: NSF S&T centers are models for needed genome informatics centers. Three to five such centers were proposed, where there would be a critical mass to allow interactions among various disciplines and training of students.
The workshop closed with some policy recommendations:
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.