Discussion of Human Genome Mapping/Sequencing Strategy to Meet Five-Year Goals October 20, 1998
NHGRI staff met with several investigators on Tuesday, October 20, 1998 to discuss mapping strategies for identifying clones for sequencing.
Present at the meeting were Bob Waterston, David Cox, Elbert Branscomb, Eric Lander, Phil Green, and Richard Gibbs. NHGRI staff present were Francis Collins, Adam Felsenfeld, Elke Jordan, Jane Peterson, and Mark Guyer. NCBI staff present were Greg Schuler and Jim Ostell. David Bentley, Jane Rogers, and Alan Coulson of the Sanger Centre joined in the first half of the discussion by videoconference.
Until now, strategies for selecting clones for genomic DNA sequencing have largely been directed toward the generation of a sequence-ready map, or a minimal tiling path of BAC clones. Recently, however, questions have been raised about the ability of this approach to meet the throughput demands of the new timetable for completing the human genomic DNA sequence. A "sequence-driven" alternative, in which clone mapping is done subsequent to sequencing, has been offered. Prior to the meeting, the pros and cons of the map first-sequence later and the sequence first-map later alternatives were summarized by Phil Green.
The main objections to the sequence-driven approach are:
The main objections to the regional map-driven (STS + fingerprint) approach are:
There are some additional objections that apply to both strategies (e.g. potential sensitivity to large scale repeats and to biases in clone coverage).
Prior to the meeting, a third, hybrid, strategy emerged. This would encompass both map-driven (regional) and sequence-driven (random) approaches; centers could choose to pursue one, or the other, or both. The random approach would start by end-sequencing randomly chosen BACs. If the end sequence hit a clone or contig that was mapped to a chromosomal region already being actively sequenced, no further work on that BAC would be done by the "end sequencer." The end sequence data would be submitted to a central server (see below), so that the regional center responsible for that region could use the BAC, if desired. However, if the end sequences were informative but did NOT hit a mapped contig, that BAC would then be lightly sequenced (0.5X coverage) by the "end sequencer." That data would again be analyzed using the central server, and a decision as to whether to sequence the BAC further would be made on the basis of minimum overlap and/or gene content. If the decision were made to complete the shotgun sequencing of the BAC (9-10X coverage) it would also be RH mapped. Once whole genome fingerprint-derived contigs become available (from the effort now under way at Wash. U.), BACs would preferentially be selected from them to help reduce the incidence of defective BACs. The participants considered that such a hybrid strategy would best meet the needs and interests of the sequencing groups. The feasibility of the approach would be dependent on the development of a central server, which would maintain a list of BACs for which any useful information was available.
Important information (to be contributed by all sequencers) to have in the database, would include end sequences, chromosome location (or "unmapped"), fingerprint data, extent of overlap with other BACs, and the data from the 0. 5X sequencing. It was also agreed that this should be a publicly accessible resource.
Later in the discussion, this server was described as version 2.0 of the Human Genome Sequencing Index. It was envisioned as playing the role of the site at which "claims" for sequencing responsibility would be established. The attendees suggested that the criterion for establishing priority for sequencing a region would be real data from that region, e.g. submission of identifying information about a clone or contig to the central server. Such information could include clone name or actual sequence data. The NCBI representatives agreed to look into the feasibility of establishing this server in the near future.
Other centralized resources/services were discussed. In addition to the BAC end sequences, BAC fingerprints, and high resolution RH maps currently being produced, a resource set of chromosome-specific STS hits on the BAC library was considered to be of value, understanding that this was to be a public mapping resource and not a means of establishing sequencing claims.
As for sequencing responsibilities, the sequence-driven approach was recognized as being most useful during the early phases of genomic sequencing. As sequence data are accumulated, closing the gaps between the growing contigs will require even the non-regional groups to take on regional responsibility. At the present time, however, assuming responsibility for regions (which at this point are more likely to be blocks of 10Mb or so, rather than entire chromosomes) will be based on a group's having actually produced and submitted to the server sufficient data to establish a valid claim. Once a group (map-driven or sequence-driven) starts the "heavy" shotgun sequencing of a clone it is committing to finishing that clone, even if it falls in a region that is being sequenced by another group; such a situation could arise if a random clone selected for sequencing is in a "claimed" region, but has no overlap with anything already in the server. It was also recommended that such claims should only be valid for a reasonable period of time in order to avoid a situation where one group's failure to complete a small region delays closure/completion of a larger region. A similar approach to and degree of coordination of the sequencing effort among participants was used successfully in yeast.
Finally, the new goal for a (better than 90%) complete working draft was discussed in the context of a hybrid plan in which most of the sequence generated would either be very light (0.5X) or complete shotgun sequencing. The importance of a complete working draft sequence, as defined in the new five-year plan, by 2001 was reaffirmed. However, some participants suggested that it would not be a great deal of effort to expand coverage from 0.5X to 3X rather quickly and, consequently, concentration of sequencing effort on light shotgun and finished sequencing in the next two years could be an acceptable strategy, with determination of how much to expand the 0.5X coverage to be made toward the end of the year 2000, as sequencing capability at that time is evaluated.
Last modified: Wednesday, October 22, 2003