BAC End Sequencing - The Caltech Contribution
Ung-Jin Kim, Hiroaki Shizuya, and Mel Simon
The following is a progress report covering the last period of work on the BAC end sequencing (BES) project. The group at Caltech has undertaken three tasks: 1) supplying arrays of BAC clones to the Hood Lab and to TIGR for high throughput end sequencing; 2) end sequencing, arraying, and checking BACs in the chromosome 16 region to evaluate the efficacy of end sequencing and the choice of minimal paths for high throughput sequencing of multi-megabase regions of the genome; and 3) developing a model system using chromosome 22 to end sequence and fingerprint BACs corresponding to chromosome 22, allowing a quantitative analysis of the utility of BAC end sequencing in providing markers and BAC substrates.
With respect to the first task, supplying BACs, we have generated and developed arrays of BACs corresponding to approximately 50,000 clones with quality control measures to identify empty wells and deficient clones in order to provide an appropriate set of substrates for large-scale BAC-end sequencing. This library was shipped to Dr. Hood. His group extracted DNA and sequenced approximately 20,000 BAC ends from this material. With regard to our second objective, we identified together with TIGR a region corresponding to 20 megabases that will be involved in high throughput sequencing at TIGR. Using a variety of probes provided by the Los Alamos National Laboratory, we selected over 2,000 BACs that map to this region of chromosome 16. We extracted DNA from these BACs using the Autogen robot and determined end sequences for most of the clones. These end sequences in general have yielded from 300-500 base pairs of usable sequence. When they were compared to BACs for which complete sequences had been obtained, we were able to position many of the end sequences and thus map the precise position of the BAC. This allowed us to close holes in the map of available BACs and to demonstrate that one could use this technique to select the BAC with minimal overlap. Furthermore, using end sequences, we generated end clones. These were used to rescreen the library and to find other adjacent BAC clones. Thus far we have screened a 12X human library (~250,000 clones) for BACs that map to the chromosome 16 region. It is clear from the end sequences, from the match ups and from the ability to saturate the region with relatively randomly distributed BACs that this approach, i.e. sequence then map, will be extremely useful in high throughput sequence determination of specific contiguous regions of the human genome. Our third goal was to use a whole chromosome, chromosome 22, as a quantitative demonstration of the utility and economy of the BAC end sequencing method. Together with TIGR, we determined the end sequences of approximately 700 BACs and we are in the process of completing the BAC coverage of chromosome 22. We expect to cover this region with 3,000-4,000 BACs and complete those end sequences. Chromosome 22 is now being sequenced by a number of groups. We can use their data to position the BAC ends and get a quantitative estimate of the utility of this approach to provide substrates with minimal overlap. In addition, we developed a method using the ABI sequencer to fingerprint clones. Our initial approach has demonstrated the feasibility of this method. We expect that we can now demonstrate its application by completing a deep, fingerprinted, overlapping BAC map of the 40 megabases for chromosome 22. This, together with the end sequences and comparisons with already sequenced regions will allow us to get numbers regarding the relative costs and savings of this approach.
We are continuing to develop new BAC libraries using DNA that has been obtained through the approval of the Caltech IRB with appropriate consent, confidentiality, and anonymity. These new libraries will be available within the next few months and all of our work will shift to the new libraries. We believe that we can demonstrate clearly the enormous utility of end sequencing BACs from the new libraries as an adjunct to, and as a preliminary approach to, high throughput sequencing.
Last modified: Wednesday, October 22, 2003