Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, January 1992; 3(5)
Twenty-four investigators met on October 4-5, 1991, at the NIH National Institute of Mental Health (NIMH) in Bethesda, Maryland, for the First International Workshop on the Identification of Transcribed Sequences, sponsored by the DOE Human Genome Program. The purpose of the workshop was to exchange information on the systematic identification of transcribed sequences and the construction of transcriptional maps for large chromosomal regions.
Investigators discussed the broad areas of (1) identification of expressed sequences from genomic clones and (2) cDNA library analysis. The group also considered strategies for the most reliable and exhaustive search for gene sequences from any chromosomal region.
J. Gregor Sutcliffe (Scripps Research Institute) defined three data requirements for including a gene on a transcriptional map: sequence, physical and genetic map location, and pattern of expression. Katheleen Gardiner (Eleanor Roosevelt Institute) described the striking variation in gene and potential CpG island density found on human chromosome 21 and suggested that different approaches may be required for gene-rich and gene-poor regions.
Several presentations were devoted to the analysis of genomic sequence information. Andrzej Konopka (National Cancer Institute) discussed basing the detection of coding sequences on statistical characteristics (such as complexity) of textual elements, using as an example protein-coding sequences.
Steen Knudson (Boston University) is investigating a neural network approach to predicting splice sites and open reading frames. Richard Mural (Oak Ridge National Laboratory) reported considerable success with a neural network/rule-based inference system. This program identifies about 90% of protein-coding exons 100 or more bases long and has predicted 14 exons in a region near the Huntington's disease locus that have been experimentally confirmed.
Four speakers addressed three approaches to direct isolation of coding sequences from genomic clones. Geoffrey Duyk (Harvard Medical School) discussed current and proposed modifications of the "exon trapping" procedure; Alan Buckler (Massachusetts Institute of Technology) described experience with the related "exon amplification" system. Nine of ten putative exons obtained with the latter system subsequently identified clones in a cDNA library.
Susan Berget (Baylor College of Medicine) presented a scheme for trapping 3' exons; such an approach would have the advantage of isolating larger exons (typically, around 600 nucleotides, compared with 100 to 200 nucleotides for internal exons).
These three approaches still require library screening to obtain a complete cDNA. Alternatively, David Kurnit (Howard Hughes Medical Institute and University of Michigan Medical School) described his recombination-based assay system in the isolation of cDNAs encoded in yeast artificial chromosome (YAC) clones. A lambda cDNA library is replicated in the presence of a plasmid library made from YAC DNA. The progeny phage are then grown in an Escherichia coli host that requires plasmid sequences; only those phage that have integrated a plasmid are viable. This system has been used to obtain many cDNAs from several human chromosome 21 YACs.
Two approaches for identifying human cDNAs from somatic cell hybrids were discussed. David Nelson (Baylor College of Medicine) has used Alu-specific polymerase chain reaction (PCR) primers to amplify human hnRNA from the Xq28 region. Michael Siciliano (M. D. Anderson Cancer Center) used splice donor site-specific PCR primers to construct libraries of amplified material and screened the libraries with human repeat sequences. False positives continue to present some difficulties, although both methods are rapid and successful.
Bento Soares (Columbia University) discussed the use of lafmid vectors in cDNA library construction. Such libraries can be normalized efficiently and used in subtractive hybridizations.
Fa-Ten Kao (Eleanor Roosevelt Institute) described the identification of 7 chromosome 21 cDNAs, obtained by screening cDNA libraries with 200 microdissection genomic clones. Large pools of clones facilitated the screening, as did the use of normalized cDNA libraries [Sherman Weissman (Yale University School of Medicine)].
Direct cDNA screening of large (>10,000) arrayed genomic libraries has been used to identify genomic clones containing transcribed sequences. Ute Hochgeschwender (NIMH) presented results for mouse chromosome 16, reporting improved sensitivity by using cDNA probes with (1) decreased complexity or (2) enrichment for low-abundance transcripts. A second application, presented by Anne Marie Poutska (German Cancer Research Fund, Heidelberg), used pig cDNA probes to screen a human Xq28-specific genomic library. This is an alternative solution to the problem of repetitive sequences and identifies conserved, transcribed sequences.
Hans Lehrach (Imperial Cancer Research Fund, London) discussed using arrayed cDNA and region-specific genomic libraries to map particular cDNAs to genomic clones. Possibilities include the use of clone pools and cDNAs from various tissues and oligonucleotide fingerprinting of both cDNA and genomic libraries.
Hybrid-selection schemes to isolate cDNA clones from YACs and from pools of cosmids were described by Mike Lovett (Genelabs, Inc.) and Weissman. In these methods, cDNAs are annealed to immobilized clones of genomic DNA, and the annealed fraction is recovered, amplified, and cloned. Impressive enrichments of >1000-fold were reported for specific cDNAs.
MaryKay McCormick (Los Alamos National Laboratory) outlined an alternative strategy that would use homologous recombination and fragmentation to locate the gene position within a YAC. Then cDNAs of interest are cloned into an appropriate vector and transformed into a yeast clone containing a YAC. Truncation of the original YAC will occur where it contains sequences homologous to the cDNA.
James Sikela (University of Colorado Health Sciences Center) and Mark Adams [National Institute of Neurological Disorders and Stroke (NINDS)] reported on projects to sequence 100 to 200 nucleotides from the 3' (and possibly 5') ends of a large number of random human brain cDNAs. The usefulness of this approach will depend partly on the generation of sufficient sequence to permit protein motif identification and also on the ability to map accurately the genomic sequences. Some regional clone localization by fluorescence in situ hybridization has been proposed (Adams).
Chris Fields (NINDS) discussed database formats for the storage of cDNA sequence information.
Participants agreed that conventional techniques, including cDNA library screening with YAC clones and searches of CpG islands and conserved sequences, can be informative but are not likely to be comprehensive. However, at this time no one technique is completely satisfactory; an exhaustive gene search will require several complementary methodologies. Many techniques discussed are still very new and have not been applied extensively.
The group recommended that another workshop be held in 1992, when experience in different laboratories will allow more critical technique evaluation. Future considerations also will include how best to approach the thorny problem of determining expression patterns, both in developmental timing and tissue specificity.
Reported by Katheleen Gardiner, Eleanor Roosevelt Institute
and Miles Brennan, National Institute of Mental Health
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v3n5).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.