Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome Quarterly, Summer 1989; 1(2)
Anthony V. Carrano, Biomedical Sciences Division, Lawrence
Livermore National Laboratory
Just as a map of the earth's surface details geographical landmarks and distances, a map of DNA provides similar information about the human genome. For the DNA map, however, the landmarks are not cities, but genes or restriction enzyme recognition sites, and the distances between landmarks are not in miles, but in numbers of base pairs (bp). DNA maps can have either a genetic or a physical basis and offer various degrees of resolution. Generally, a strategy for physical mapping is chosen that is consistent with the interests of the scientists and with the laboratory's general programmatic effort.
There are two primary types of physical maps: a macrorestriction map and an ordered-clone map. The highest resolution physical map is, of course, the DNA sequence itself-the ultimate goal of the human genome effort. Given the present state of sequencing technologies, it is neither economically feasible nor technologically appropriate to begin a large-scale project now to sequence the entire genome of man. Rather, the construction of physical maps, as a first priority, will facilitate future sequencing efforts. Knowledge of the physical map and its correlation to the genetic map will guide the scientific community in assigning priorities to regions of the genome to be sequenced. A physical map of DNA clones can provide the raw material for sequencing.
A macrorestriction map is a linearly ordered set of large fragments of DNA representing a chromosome region or, potentially, an entire human chromosome. The fragments are derived by cutting very high molecular weight DNA with restriction enzymes whose recognition sites are of low occurrence in the genome. Since the dinucleotide sequence CpG is estimated to be underrepresented in the human genome by about fivefold, restriction enzymes that contain such dinucleotides as part of their recognition site will cut the DNA infrequently. Typical of the restriction enzymes in this category are Not I (GCGGCCGC) and Mlu I (ACGCGT), which theoretically cut, on the average, every million or 300,000 bp, respectively.
To obtain such large fragments of DNA, shearing must be minimized during the DNA isolation process. First, the cells are embedded in agar blocks, and then the DNA fragments are electrophoresed out of the agar blocks and separated using an alternating-field electrophoresis system. Depending on the conditions of electrophoresis, fragments ranging in size from about 50 kilobases (Kbp) to 10 megabases (Mbp) can be seen either by staining the gel with ethidium bromide or by performing Southern blots and identifying specific fragments by hybridization to radiolabeled probes.
Even with the rare-cutting restriction enzymes, the human genome contains too many fragments to separate on a single gel lane. Thus for most mapping studies, hybrid cells containing a single human chromosome are used, and the human chromosome-specific fragments are identified by Southern hybridization using total human repetitive DNA, human Alu-sequence probes, or human unique sequence probes.
A macrorestriction map, developed by using rare-cutting enzymes, provides information at a level of organization between the intact chromosome and cloned fragments of DNA. It serves as a global map of fragments spanning large regions of DNA. More detailed maps will then be related to the global macrorestriction map.
Obtaining relative order of the restriction fragments is also possible. For example, techniques have been developed to construct probes that contain rare-cutting restriction sites and their flanking DNA. These are called linking probes because they uniquely identify adjacent restriction fragments. If DNA is digested with Not I and hybridized to a Not I linking probe, the two adjacent fragments will be identified on the gel. A collection of such linking probes would then allow one to order the Not I restriction fragments for the chromosome or region of interest.
Another interesting application exploiting the rare-cutting restriction sites is to identify DNA polymorphisms. Since these sites are rich in CpG, which are also targets for methylation, the methylation sensitivity of many of the rare-cutters prevents them from cleaving the methylated site. In comparing two sources of DNA, differences in restriction fragment patterns might, therefore, identify patterns of DNA methylation. Finally, it has been shown that Not I sites will often cluster in islands that are located adjacent to gene sequences. This information can be used to signal gene locations in a segment of DNA.
The Human Genome Program in the Department of Energy (DOE) Office of Health and Environmental Research supports a number of projects that develop or make use of macrorestriction maps of DNA. To highlight the various strategies and techniques, a few of these research activities are delineated here.
At Columbia University, Cassandra Smith and Charles Cantor have been constructing maps of the entire human chromosome 21 and the Huntington's region on the short arm of chromosome 4. Using somatic cell hybrids containing human chromosome 21 and several rare-cutting restriction enzymes, as well as human repetitive, unique sequence, and linking probes, Smith and Cantor estimate that about 40 Mbp of the chromosome can be recognized by the rare-cutter fragments on a gel. The researchers are in the process of determining the order of the fragments.
Thomas Caskey and his colleagues at the Baylor College of Medicine have established a macrorestriction map spanning a 7-Mbp region on the human X chromosome containing the G6PD gene cluster.
A group under the direction of Michael McClelland at the University of Chicago is exploiting the methylation sensitivity of some of the rare-cutting enzymes. By using methylation-sensitive restriction enzymes together with modification methyltransferases, McClelland's group is able to control partial digest reactions. Moreover, by first methylating adenine with a methyltransferase and then using methylation-dependent rare-cutters, very large DNA fragments (3-5 Mbp) are produced in some genomes. In conjunction with Carol Westbrook, also at Chicago, the researchers plan to apply these techniques to create a macrorestriction map for a region of chromosome 5.
While the rare-cutter restriction map is extremely useful, it does not provide cloned DNA for further analysis. Complementary methods which make use of cloned DNA are necessary.
An ordered clone map is a collection of cloned DNA fragments arranged in the same linear position that they would have along the native chromosome. The clones are generally derived from primary arrayed libraries, and each clone is maintained either in the well of a microtiter tray or in an individual tube. The DNA may be cloned in any one of the vector systems [e.g., yeast artificial chromosomes (YACs), cosmids, phage, or even plasmids]. The larger the cloned insert, the lower the map resolution; but, fewer clones must be analyzed to construct the map. Three of the DOE-supported efforts to create clone maps rely primarily on cosmids which contain about 40 Kbp of insert. However, large, human DNA inserts of more than 100 Kbp are being cloned by YAC cloning vectors now being developed at Los Alamos National Laboratory (LANL), Lawrence Berkeley Laboratory (LBL), and Lawrence Livermore National Laboratory (LLNL), in collaboration with university-based investigators.
Clones from multiple libraries, and possibly from multiple vector
systems, are believed to be necessary to complete a map of the entire
chromosome. At least two factors contribute to this
Approaches for ordering cloned DNA fragments are based either on fingerprinting the cloned inserts or on searching for homology between the clones based upon DNA hybridization. The two methods are discussed briefly below.
Clone fingerprinting methods attempt to identify a unique
signature or fingerprint for each insert. Fingerprints are generally
produced by digesting randomly selected clones from a primary arrayed
library with one or more restriction enzymes. After the restriction
fragments are separated on a gel (either polyacrylamide or agarose),
their fragment lengths are determined. Generating a complete
restriction map of the clone is not necessary, but a sufficient
number of fragments which uniquely identify that clone must be
created. The fragment lengths from each clone are then compared to
every other clone, and a statistical analysis is applied to determine
if any clones possess a significant number of fragments of the same
length. Inserts that have some portion of their fingerprint in common
are then said to overlap. To a first approximation, the more
similar the fingerprint, the greater the overlap. Thus, the ability
to detect overlap between clones by this method depends on such
If other information can be added to fragment length, such as DNA sequence information for one or more fragments in the fingerprint, the ability to discriminate true from false overlap increases.
Random clone fingerprinting techniques generally require the analysis of a fivefold to tenfold clone redundancy in order to achieve about a 70-80% coverage of the chromosome or region of interest. These techniques can be laborious and computationally intensive. On the other hand, because the tasks are repetitive, they are amenable to automation, which decreases the likelihood of human error and optimizes both throughput and reproducibility. Also, because fingerprinting methods alone are unlikely to yield a complete map, they must be complemented with other procedures (e.g., macrorestriction mapping or hybridization techniques) to achieve map closure.
Both LLNL and LANL have adopted clone fingerprinting methods as a first step toward the construction of ordered clone maps. At Livermore, a method has been developed to label the ends of all restriction fragments from a single cosmid with a fluorescent dye. The fluorophor-labeled fragments are separated in a high-resolution denaturing polyacrylamide gel using a modified, commercially available, automated DNA sequencer. Afterwards, fragments in the gel are detected as they migrate past a laser beam. In the present configuration, four fluorochromes can be used; thus, up to three cosmids plus a labeled size standard can be analyzed per gel lane (i.e., 96 cosmids per single gel run). Data collection is automatic, and analysis of the fingerprints is performed off-line with minimal operator intervention. Using this technique, the LLNL group has assembled a set of overlapping cosmids that span a 600-Kbp region of chromosome 14, analyzed over 1500 cosmids (approximately onefold cosmid coverage) from chromosome 19, and assembled over 100 contigs (sets of overlapping clones).
The group at LANL has developed another approach to clone ordering. Their approach exploits certain repetitive DNA sequences and uses them as nucleation sites for mapping random clones. In this approach, individual clones are fingerprinted by a combination of restriction enzyme digestion and DNA hybridization. Restriction fragments from each clone are separated on agarose gels; their lengths are determined by comparison to size standards. Additional information is obtained for each clone by hybridizing the fragments on the gel with a repetitive DNA probe, which identifies a sequence that is present, on average, in 30-50% of the cosmids. The pattern of repetitive DNA hybridization, along with restriction mapping information, is acquired by image capture and analyzed by conventional algorithms. The result is a highly informative fingerprint of each clone. Using this approach, the Los Alamos group has processed about 2000 cosmids (approximately onefold coverage) and has assembled them into over 100 contigs.
Constructing overlapping sets of clones by detecting DNA sequence homology is a powerful and rapid strategy for clone mapping. Recently, the feasibility of this strategy has been successfully tested on a small region of human chromosome 11 by Glen Evans at the Salk Institute. This approach makes use of cosmid vectors that have bacteriophage T3 and T7 promoters flanking the cloned insert. The promoters are used to produce an RNA transcript that can be used as a hybridization probe. The probes from the ends of the cosmid insert are then hybridized to an array of cosmids representative of a chromosome region. A cosmid in the array that hybridizes to the probe should share sequence homology with the cosmid from which the probe was derived and, hence, should overlap it. The hybridization must be conducted at adequate stringency, and repeated sequences that would potentially cause false positive hybridizations must be blocked by prehybridizing the probe transcript with pooled, repetitive DNA.
In practice, a cosmid library is constructed for a chromosome region of interest and organized in replicate rectangular filter arrays so that each cosmid has a fixed coordinate. After DNA mixtures of cosmid clones are pooled from one row, transcripts are prepared from the pooled mix, and the probes are hybridized to one of the filters. Similarly, pooled probes from one column are hybridized to the replicate. Positive hybridization signals are obtained with all cosmids from the row and column from which the probes were derived. In addition, another cosmid at array location D3 hybridized to both the row and column probes. The onlycommon cosmid from the row and column probes is located at the intersection of the row and column pools (i.e., B2). Thus the cosmid at B2 must overlap with the cosmid at D3. This technique, which has the sensitivity to detect very small regions of overlap, has been applied by Evans to a 32 X 36 array with about 1000 cosmids from chromosome 11q13-qter. After 68 (row plus column) hybridizations, he was able to establish over 300 contigs ranging in size from 2 to 27 cosmids. Whether or not this method can be scaled to deal with larger genomic regions remains to be determined.
In this overview, some of the physical mapping strategies currently being supported by the DOE Human Genome Program have been summarized. The laboratory techniques used to implement these strategies are state of the art and continually being improved. There is no one optimal strategy for physical mapping. In fact, new methods probably will be developed that replace or refine all of the present approaches. As those who are involved in physical mapping quickly realize, none of these techniques alone will produce a complete map. The final map likely will be derived from a combination of the strategies discussed above, as well as others yet to come.
The genetic map is an important and complementary tool in human genome research. It can be used to locate the relative positions of genes, to assist in the study of the heritability of genetic diseases, and, ultimately, to validate the physical map. A genetic map of DNA provides information on the relative location of genes or gene markers (e.g., restriction fragment length polymorphisms, RFLPs). The distances on the genetic map are measured in centimorgans (cM), a measure of the recombination frequency between loci. The greater the distance between two loci, the greater the probability a meiotic recombination event will occur. While the genetic map does provide relative order for the loci on a chromosome, the frequency of recombination is not an accurate measure of physical distance. It is now well documented that some regions of DNA recombine much more frequently than others; therefore, the recombination frequency will either over- or underestimate the true distance in base pairs. The resolution of the genetic map is dependent upon the number of meiotic recombination events observed between loci as detected by RFLPs in two- or three- generation families with a large number of sibs.
The electronic form of the newsletter may be cited in the
Human Genome Program, U.S. Department of Energy, Human Genome News (v1n2).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.