Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, November 1990; 2(4)
The Genome Sequencing Conference II, cochaired by J. Craig Venter (NIH National Institute of Neurological Disorders and Stroke) and Walter Gilbert (Harvard University), was held September 30-October 3 at Hilton Head, South Carolina. Sponsors of the conference included federal agencies and commercial organizations.
The first full day of the meeting, October 1, marked the official implementation of the DOE-NIH 5-year plan for the national Human Genome Project. The meeting, attended by some 200 people, provided an excellent forum for exchanging ideas among researchers and developers of hardware and software.
James Watson [Director, NIH National Center for Human Genome Research (NCHGR)] emphasized that mapping and sequencing model organisms is necessary now to prepare for human genome sequencing, even though the same technologies may not be used to sequence the human genome. He urged his audience to "go home and publish sequence" and said that simply endorsing the genome project is not enough. Watson also stated that because of important commercial and research applications, new sequence data should be released as soon as it is ascertained to be correct.
David Galas [DOE Associate Director, Office of Health and Environmental Research (OHER)] underscored the consensus that automating present sequencing technology to its projectable limits would be sufficient to accomplish the short-term sequencing goal of 3 billion base pairs. He said that novel technologies, described in some high-risk proposals, will be needed to reach more advanced, longer-term goals.
Galas also announced that the DOE physical mapping effort will include establishment of a master set of cDNAs for deriving sequence tagged sites (STSs)-important because the STSs will be placed on sites actively being transcribed in the genome.
Venter described his work in obtaining STSs from 30,000 human brain cDNAs. These "expressed sequence tags" (ESTs), along with their first 400 clones, will be available from a new category in GenBank® and from the American Type Culture Collection (ATCC), respectively.
A number of groups are working on projects that will optimize the automation and scaleup of laboratory apparatus and data analysis capability, such as the following:
Progress reported by several laboratories in completing cosmid and larger-size sequencing projects included:
There was general agreement that sequencing reactions and gel scanning have been successfully automated and that progress is being made in automating the front-end steps of mapping, cloning, and template preparation; contig assembly using computer algorithms is now the rate-limiting step. Participants also discussed strategies for gap closure and for resolving ambiguities caused by repetitions or polymorphisms.
Attendees agreed that software development is needed for transferring data between units of laboratory equipment, for inputting data, and for storing and analyzing massive amounts of raw sequence data. One speaker noted that sequence-analysis software packages sometimes require different formats of GenBank®, with each format using over 100 Mbytes of disk space. Some participants at the meeting called for cooperation among software developers, because as GenBank® increases in size, the demand for user disk space will become particularly acute.
Several speakers discussed error sources, propagation during assembly, and effects on sequence analysis. Some investigators working with prokaryotes projected error rates of only .00001%, but the general consensus was that errors of .01 to .001% would be acceptable and achievable at a reasonable cost for a first pass through the whole genome. As technology advances, regions of interest could be resequenced with a higher degree of accuracy at reduced costs.
Venter, reading a statement prepared by Applied Biosystems, Inc. (ABI), announced that an agreement had been reached between the company and the scientific community that relies heavily on ABI sequencing hardware and software. ABI will enter into written licensing agreements with individual laboratories to provide access to data file formats to those wishing to develop the sequencing software for their research. If such software is useful to the scientific community and is distributed, the researchers will include text in the software to indicate that the ABI proprietary file format is for research purposes only and not to be used in any commercial product.
Two informative evening discussions were held simultaneously-one on sequencing instrumentation and the other on software requirements of large-scale sequencing projects. Informal presentations at the instrumentation workshop included:
Software workshop participants discussed software problems and development of some general solutions that could be implemented at many different project sites. During the discussions, there was general agreement that the evolving databases should be designed to increase the portability of data and to provide additional data fields (for specific sequence data) that will allow for:
Participants predicted the development of global databases that can be queried by scientists searching for answers to basic biological questions (e.g., understanding eukaryotic gene expression, developmental biology, disease etiologies, and protein function). Representatives from the computing community requested that detailed descriptions of the needs of molecular geneticists be specified; when such specifications stabilize, further software development can commence.
Another area for cooperation between the scientific communities concerns the development of sequence assembly programs, some of which are being written by academic computer scientists who may not have resources for software customer support. One solution suggested was to turn the programs over to the commercial sector for further development, documentation, distribution, and customer support.
Jane Peterson (NIH NCHGR) reported that the September 29 meeting of the DOE-NIH Working Group on Sequencing focused on reducing costs and developing models for cost assessment in scaleup projects.
A more detailed report on the conference, containing lists of speakers and their topics, can be obtained from HGMIS. (See related article, The Genome Project and the Pharmaceutical Industry, a satellite meeting to Genome Sequencing Conference II).
September 22-25, 1991 Hyatt Regency, Hilton Head, South Carolina
Cochairs: Craig Venter, Leroy Hood
Reported by Kathleen H. Mavournin
and Betty K. Mansfield
HGMIS, Oak Ridge National Laboratory
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v2n4).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.