Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, May 1993; 5(1)
The DOE-sponsored Second International Workshop on the Identification of Transcribed Sequences was held in San Francisco on November 7-8, 1992. The purpose of the workshop, which was attended by 44 scientists from 7 countries, was to discuss and evaluate techniques for developing a complete transcriptional map of the human genome.
Such a map requires the positions, sequences, and expression patterns of all genes. This goal is being approached from two different directions, each with strengths and weaknesses. One method is to identify the transcribed sequences from genomic DNA of a given region; the other is to systematically sequence and map cDNAs. The cDNA approach yields sequence information rapidly, but mapping each cDNA is a technical challenge. In the first approach, the map locations of genomic sequences are known at the outset, and the challenge is to identify exons. The efficient construction of a transcriptional map will require a diverse array of techniques.
Charles Auffray (Genethon, France), James Sikela (University of Colorado), and Mihael Polymeropoulos [NIH National Institute of Mental Health (NIMH)] presented large-scale, partial cDNA sequencing results. A number of methods are being used to integrate these sequences with physical or genetic maps. Minoru Ko (Wayne State University) used polymorphic sequences in 3' untranslated regions of mouse cDNAs as genetic markers in interspecific mouse crosses. Polymeropoulos employed polymerase chain reaction (PCR) primers with a panel of hamster-human cell hybrids to assign human cDNAs to chromosomes. Sikela reported use of fluorescence in situ hybridization to map some cDNA sequences.
The unequal representation of clones in cDNA libraries poses another problem for the cDNA sequencing approach. With 10,000 to 20,000 genes expressed in a given tissue, their unequal representation would require the sequencing of 1 million clones to approach complete representation for a tissue. An important goal is to achieve equal representation of both abundant and rare species in cDNA libraries (i.e., a "normalized" library). Bento Soares (Columbia University) has produced normalized cDNA libraries by reannealing cDNA at a high Cot value and selectively cloning the remaining single-stranded fraction. Cheng Chi Lee [Baylor College of Medicine (BCM)] proposed normalizing arrayed cDNA libraries by rescreening with pools of clones already sequenced to eliminate redundancies.
Another approach to defining a minimal set of cDNA clones is the identification of each clone according to its hybridization "fingerprint" against a panel of oligonucleotides. Radoje Drmanac (Argonne National Laboratory) and Sebastian Meirer-Ewert (Imperial Cancer Research Fund) reported progress in rapidly screening cDNA libraries with short oligonucleotide probes.
cDNA libraries from rare cell types are necessary for identifying the genes expressed in these cells. Barbara Knowles (Wistar Institute) has produced cDNA libraries from specific stages of early mouse embryos. James Eberwine (University of Pennsylvania) has developed techniques for producing cDNA libraries from single cells. His method relies on amplification by prokaryotic RNA polymerases rather than on PCR amplification, which can severely skew representation in heterogeneous mixtures.
While mapping and normalization are challenges in the cDNA sequencing approach, the genomic approach requires identification of transcribed sequences among the surrounding untranscribed sequences. Genomic DNA for this strategy is made available by advances in physical mapping, especially large arrayed cosmid and lambda libraries from specific chromosomal regions and yeast artificial chromosome (YAC) clones. Ute Hochgeschwender (NIMH), Anne Marie Poustka (German Cancer Research Center), and Geoffrey Falk (Scripps Research Institute) reported using labeled cDNA for probing large arrayed genomic libraries to identify transcribed sequences. This technique is suited for very large regions covered by arrayed libraries; about one-half the genes in a given tissue are expressed at levels high enough to be detected with these probes. Furthermore, expression patterns can be determined by repeated cDNA probe hybridizations of different tissues at different developmental stages.
Richard Mural [Oak Ridge National Laboratory (ORNL)] and Gordon Hutchinson (Canadian Genetic Disease Network) have developed software that can identify coding exons from genomic sequences with over 80% success and a low false-positive rate. An additional feature of the ORNL GRAIL program is a gene-assembly program (GAP) that produces complete gene sequences from the GRAIL output, resulting in an even lower false-positive rate. Susan Berget (BCM) discussed her work on splice-site selection, which is based on defining the nucleic acid "signals" associated with splicing. David Searls (University of Pennsylvania) presented a linguistic approach to sequence analysis based on conceptual similarities between meaningful language and expressed sequences.
Exons can also be identified by procedures in which a genomic fragment is cloned into an intron of a mammalian expression vector. A fragment containing an exon will be included in the mRNA expressed from the vector. Alan Buckler [Massachusetts General Hospital (MGH)] reported improvements to the splicing vector pSPL1. Paul Nisson (Gibco BRL/Life Technologies, Inc.) discussed a potential problem in transforming pools of genomic fragments cloned in splicing vectors: unequal exon-amplification rates in RT/PCR reactions can lead to the loss of all but the most efficiently amplified exons. This problem is also encountered in amplifying cDNA libraries. Geoffrey Duyk (Harvard Medical School) and Nicole Datson (Leiden University, Netherlands) reported on modifications of exon-trapping systems. While most exon-identification protocols are designed to detect internal exons, David Krizman (BCM) described a new system for detecting 3' terminal exons in which polymorphic sequences in the 3' untranslated regions may facilitate exon mapping.
David Kurnit (University of Michigan) described his system for identifying exons cloned in a plasmid vector by their ability to recombine homologously with cDNA clones in lambda. The system detects the exon, isolates the cDNA, and (when tested with multiple cDNA libraries) supplies information on expression patterns.
Another general approach to identifying transcribed sequences from genomic clones involves hybridizing cDNA to genomic DNA and eluting the specifically bound "selected" cDNA. Michael Lovett (University of Texas), Poustka, Danilo Tagle (University of Michigan), and Sherman Weissman (Yale University) presented results using variations of this procedure with pools of cosmid clones and with whole YAC DNA. To map these genes within the YAC, Ruchira DasGupta (Albert Einstein College of Medicine) is using cDNAs isolated by this method and cloned in a yeast vector to truncate the YAC at the region of homology.
The usefulness of the different approaches can best be assessed by applying them to large regions. Gail Bruns (Children's Hospital, Boston) reported searching through the 15-Mb WAGR region for conserved sequences associated with HTF (Hpa II twin fragment) islands. Bernhard Weber (University of British Columbia) used a cDNA selection strategy and single-strand-conformation polymorphism technique to isolate new genes from the Huntington's disease region and to search these for mutations.
Giorgio Bernardi (Institut Jacques Monod, France) and Katheleen Gardiner (Eleanor Roosevelt Institute) discussed long-range differences in genome structure. While gene density is highest in regions richest in GC (in particular, most telomeres), one-third of genes are in AT-rich regions. Gene content as measured by CpG-island density does not appear to vary with GC content in a predictable manner. The applicability of techniques to the gene-poor regions was raised, and Mural and Hutchinson requested sequences of verified AT-rich exons for teaching neural networks.
The functional analysis of products encoded by novel genes was discussed. Roger Brent (MGH) described a yeast interaction trap system for identifying protein regions that interact in vivo. Miles Brennan (NIMH) reported on the specific integration of a mammalian cDNA at a homologous gene in yeast, a system that may allow direct selection and functional analyses of such cDNAs. The need for more functional assays was noted by Weissman.
The Third International Transcribed Sequences Workshop is planned for October 2-4, 1993 in New Orleans. Contact: Ute Hochgeschwender; Unit on Genomics, NIMH; Bldg 10, Room 4N 320; Bethesda, MD 20892 (301/402-1769, Fax: -2140).
Reported by Miles B. Brennan (National Institute of Mental Health) and Katheleen Gardiner (Eleanor Roosevelt Institute)
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v5n1).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.