Alta, Utah (December 1-2, 1997): Elbert Branscomb, Mario Capecchi, Marvin Frazier, Harold Garner, Raymond Gesteland (Chairman), Richard Gibbs, Phil Green, Trevor Hawkins, Mike Knotek, Miriam Meisler, Ari Patrinos, Jane Peterson (NIH staff), Lloyd Smith, Monte Westerfield
Gaithersburg, Maryland (March 12-13, 1998): Elbert Branscomb, Ken Buetow, George Church, Dan Drell (DOE), Elise Feingold (NIH), Marvin Frazier (DOE), Rainer Fuchs, Harold Garner, Raymond Gesteland (Chairman), Betty Graham (NIH), Trevor Hawkins, Elke Jordon (NIH), Miram Meisler, Ari Patrinos (DOE), Lloyd Smith, Randy Smith
The DOE Genome Project is in a transitional phase, evolving to balance the demands of large-scale sequencing and technology development, while at the same time setting the stage for genomic analysis of gene function. Discussion of progress and goals emphasized the following points:
The DOE microbial genome project has made a very substantial contribution to our understanding of the diversity of microbial life and the complexities of evolution. A large fraction of genes (30-50%) that are found in any newly sequenced microbial genome do not have known relatives. Sequencing of additional genomes should be high priority both to understand our biological world and to enlarge the repertoire of genes that may be of practical importance.
Joint Genome Institute
The DOE genome project has made a major commitment to support of a large scale DNA sequencing facility - the Joint Genome Institute (JGI) under the direction of Dr. Elbert Branscomb. Resources and scientific talent from genome efforts at three national laboratories have been pooled and brought to bear. Very challenging goals of ramping up production have been set and every effort must be made to ensure its success.
Currently LBL, LANL and LLNL are pursuing sequencing within their own structures to meet the production goal of 20 mega bases of finished sequence for fiscal 1998. So far, production is on track to meet this goal. This is being accomplished even under the pressure formulating a united scheme for the factory productions and planning the new facility. These pressures will escalate in the coming months and the stress of the move to the new facility and the need to double the sequence output to 40 mb in fiscal 1999. This latter goal will be very challenging because of the many distractions. Some patience should be shown so that the factory can get up and running effectively, so as to have a real shot at meeting future sequencing goals.
JGI needs to show that it can sustain production of high quality, contiguous, sequence. This is imperative even if it is at the cost of throughput and cost during the crucial first and second years. However, every effort within reason must be made to keep to the proposed aggressive ramp-up. The proposed goals are:
In light of the major investment of the DOE genome project in the JGI, and the importance of success of this venture, the first priority of funding genome research must be to strategic ventures that will help insure success of the JGI. The necessity of "getting on with sequencing" requires commitment of substantial funds to production at the JGI, which will limit the ability of the DOE genome project to support technology development.
Yet current technologies must be augmented by improvements in automation, in sequencing chemistries and in computer tools for assembling and interpreting DNA sequences in order to improve both efficiency and cost.
As the JASON report on the genome project points out, this is just the time when technology development is very important for the genome project and technology is DOE's forte. Both the JASONs and the BERAC genome subcommittee are in accord that despite the immediate need for production sequencing, funding must be ensured both for short term developments that enhance current production and for more long term technologies that will provide the key tools for the future.
Current technology for genomic DNA sequencing has about equal cost contributions from labor and reagents (capital investment in equipment is quickly amortized) with an overall cost of about $0.50/base pair, although the many vagaries of calculating costs make this number quite soft and subject to individual lab interpretation. However, for the JGI (and the genome project in general) to meet its goals, the cost must come down by a factor of 3 to 4, without compromising sequence accuracy. To have any effect during the major sequence accumulation phase of the project (until 2005) only incremental improvements to current technology are likely to pay off. Incremental improvements to the sequencing process itself, including alternative chemistries and longer read lengths, are resulting in overall cost improvements. Automation of sample preparation and handling continue to hold the promise of cutting labor costs and improving reliability of the process. (See addendum "A" for more complete list of needed improvements.)
However, just the development of improved technologies is not sufficient. It is often the case that promising technologies languish because of the difficulties of moving them into production streams. Disruption of the production effort and dependence on a new, untested technology make the risks of implementation too high. Thus a major challenge is to find ways to support "hardening" of incremental technologies so that they can be moved into production with minimal risk. A targeted funding method is needed to solve this problem otherwise the investment in many incremental technologies will have been wasted. Perhaps new cooperative agreements can be the tool. However, it is crucial that appropriate measures be in place to ensure that the value of incremental technologies is assessed during this hardening phase. This will require monitoring usefulness and establishing milestones for performance.
There is a crucial need for development of new sequencing technologies that will be the tools for the future. The appetite for sequencing will only increase, but the costs of current methods, even with incremental improvements, will greatly limit sequencing capacity. While some new approaches are in the wings, (See addendum "A" for a list), they are not likely to contribute to the large-scale sequence accumulation needed by 2005 to meet the primary goals of the genome project. This reality should not discourage investment in longer-term development of new sequencing technology. However support should be predicated on new technologies being able to reduce the cost of sequencing by 20 to 100 fold. Anything less will be too late with too little. In addition support for long term technologies cannot be at the expense of JGI's success.
It is important to lay the groundwork now for this next stage of the genome project. Genome sequences are only starting points - the human sequence is a tool to use to identify the information for each of the 100,000 genes with the ultimate goal of determining the function of each gene. Defining the very complex network of interactions of gene products will be the heart of biomedical research for many decades.
Genome sequence is also a tool that permits examination of human variation with direct applicability to understanding individual susceptibility to disease and environmental insults such as exposure to low radiation doses. With the reference sequence in hand, genes that play a role in susceptibilities will be identified leading to an understanding of differential susceptibility in the population. Using the mouse as a model organism is particularly powerful.
There are many aspects of the application of genome technology to DOE missions. Analysis should be expanded on a number of these fronts if the resources can be found. This is the payoff - the harvest of the genome project.
However, given the stringent demands of the production-sequencing phase of the Genome Project, presently available resources are far too constrained to do justice to the scope and importance of developing tools of this nature.
Development of informatics tools continues to be high priority. However, there is still the nagging concern that tools to do equivalent job are developed independently in many different labs. While it may be true that each large scale sequencing center will need to develop informatics tools to support their own technologies, sharing of solutions needs to be encouraged. The development of data bases and their tools needs to be driven by the user community. It is anticipated that the joint NIH-DOE workshop to consider appropriate informatics goals will provide the needed guidance.
Input will come from the joint NIH-DOE ERPEG.
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.