Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, January-March 1996; 7(5)
The Fifth International Workshop on the Identification of Transcribed Sequences was held November 5-8, 1995, on Ile Les Embiez, an island off the coast of France. The meeting was sponsored by the Association Française contre les Myopathies, Centre National de la Recherche Scientifique (CNRS), Institut National de la Santé et de la Recherche Medicale (INSERM), La Région Provence-Côte-d'Azur-Corse, and DOE. More than 50 speakers discussed topics including the generation of regional and chromosomal transcriptional maps, functional analysis of gene expression, techniques for isolating and analyzing genes, the use of model organisms, and informatics.
This year's workshop showed clearly that the robust techniques of cDNA selection, exon trapping, and software trapping of genomic sequences are being applied rigorously and successfully to numerous large regions of the human genome. The great successes of transcriptional mapping have located large numbers of novel genes for which function must now be determined. This new challenge promises to be particularly formidable, perhaps even more difficult than transcriptional mapping has been. However, exciting progress and important ideas in several relevant areas were demonstrated at the workshop. These included methods for determining differential expression and tissue-specific expression of large numbers of genes, use of model organisms from yeast to mouse for functional and mutational analysis, and expanded (and generally more accessible) informatics tools.
Selected Presentation Summaries
Transcriptional Mapping. Considerable progress is being made in transcriptional mapping of entire chromosomes and of selected chromosomal regions. Exon trapping, cDNA selection, and genomic sequencing followed by GRAIL analysis remain the most popular approaches, but improvements to these techniques and some alternative approaches were also discussed.
Particularly detailed discussions were presented for chromosomes 19 and 21, where emerging patterns in gene families and overall gene distribution can be related to genome organizational features. Chromosome 19 appears to be very GC rich and gene rich [Anthony Carrano, Lawrence Livermore National Laboratory (LLNL)]. Chromosome 21 shows regional variation in gene density based on the results of exon trapping and cDNA selection (Marie-Laure Yaspo, Max Planck Institute for Molecular Genetics, Germany; Katheleen Gardiner, Eleanor Roosevelt Institute).
Significant numbers of transcribed sequences have also been isolated from chromosome 7 and correlated with physical maps [Stephen Scherer, The Hospital for Sick Children (THSC), Toronto; Eric Green, NIH National Center for Human Genome Research].
An alternative approach to whole-chromosome mapping used reciprocal screening of arrayed cosmid and cDNA libraries (Cheng Chi Lee, Baylor College of Medicine) and yielded close to 100 new genes for each chromosome surveyed (human chromosomes 7 and X).
Results of regional mapping efforts also show variability in gene density based on the number of genes obtained. The Giemsa band 3p12-13 is CpG-island poor and has proved difficult to map transcriptionally [Vasi Sundarasen, Medical Research Council (MRC), Cambridge]. On the other hand, the reverse band 3q21 is GC rich and rather gene rich, although low levels of expression have hampered detailed analysis of cDNAs obtained through cDNA selection (Alla Rynditch, Institute of Molecular Biology and Genetics, Ukraine). Similarly, mapping within chromosome 14q32.1 yielded few genes, but several local clusters of genes were defined within 14q24 (Anne-Francoise Roux, THSC).
Transcriptional mapping efforts on the X chromosome have yielded 15 new genes within Xqcen-q21, including one for Menkes Disease and one for an alpha thalassemia (Michel Fontes, INSERM). Jozef Gecz (Women's and Children's Hospital, North Adelaide, Australia) reported on two genes isolated from Xq28, near FRAXE. They identified one gene in this region and a second, potential gene in a large (>100-kb) exon of the first gene, presumably transcribed in the opposite direction. A combination of exon trapping and cDNA selection has yielded more than 12 genes within another 300-kb region of Xq28 (Nina Heiss, German Cancer Research Center, Heidelberg).
Other Transcriptional Analyses. Gene-specific transcriptional analyses included use of exon trapping and cDNA selection to isolate the gene for motor-endplate disease in mouse (David Kohrman, University of Michigan Medical School). Sample sequencing of nested deletions was used to define gene orientation and repeat content in the HLA C region (B. Rajendra Krishnan, Washington University School of Medicine). Analysis of a cytokine receptor family in 21q22.1 demonstrated complex patterns of alternative processing of some members and gave clues to the evolutionary origin of the 300-kb cluster (Georges Lutfalla, Institut de Génétique Moléculaire, Montpellier, France).
Marcia Budarf (Children's Hospital of Philadelphia) compared sequence- based methods for gene discovery using the rapidly growing EST resources and computational prediction of coding regions using GRAIL. In the intensively studied Di George and velocardiofacial syndrome (DGCR) region of human chromosome 22q11.2, 40% of the 12 previously identified genes were represented in the EST database. GRAIL predicted exons for 75% of the genes known to be in this region. As expected, GRAIL did less well on intron-less genes and small transcripts, whereas EST-based methods were less sensitive for genes with low or restricted modes of expression.
Improvements to Exon Trapping and cDNA Selection. A historic problem with exon trapping has been the limitations imposed by having only a single, usually small, exon as the only product of a given experiment. Esther van de Vosse (Leiden University, Netherlands) discussed new cosmid-based vectors for exon trapping that allow simultaneous trapping of a number of exons. This system should greatly facilitate the scanning of large genomic DNA regions for the presence of genes.
Anthony Brooks (MRC, Edinburgh) discussed improvements and enhancements to coincident sequence-cloning methods that improve the method's efficiency and reduce background problems. These procedures isolate different products from those obtained by direct cDNA selection, thus providing a useful complementary method.
Expression Patterns. David Beier (Harvard Medical School) discussed ongoing work to map more than 1000 mouse ESTs using single-strand conformational polymorphism. Of more than 600 sequences tested, 89% were polymorphic and thus could be mapped by this method. Mapping these ESTs will help to integrate genetic and physical maps of the mouse genome.
Several groups are exploring procedures for isolating tissue-specific genes and determining broad patterns of differential expression by constructing and screening tissue-specific libraries. Tissues included 10.5-day mouse embryo (Stephen Kingsmore, University of Florida), human heart and testes (Chris Lau, University of California, San Francisco), and fetal kidney (Cecile Jeanpierre, INSERM, Paris). A second approach uses colony screening of arrayed cDNA libraries with RNA from different tissues, cell lines, or induction states. System calibration and sensitivity definition have been carried out (Karine Bernard, CIML, Marseille), and expression data for muscle (Genevieve Peitu, CNRS, Villejuif) and thymus (Dominique Rocha, CIML, Marseille) gene expression have been examined.
Functional Analysis. A new and very useful session on functional analysis of gene expression considered a potpourri of approaches and results. Alexandre Reymond (Massachusetts General Hospital) discussed the use of interaction-mating technologies to make testable "guesses" about the function of unknown proteins and to examine protein-protein interactions.
James Eberwine (University of Pennsylvania Medical School) presented elegant work on the subcellular localization of mRNA in neurons. Different parts of neurons harbor different populations of mRNAs. One interesting observation was that alternatively spliced forms of the same mRNA could be found in the same cell. Synaptic activity was also shown to influence locally the translation of various mRNAs.
Three presentations discussed use of differential display. Shermann Weissman (Yale University School of Medicine) showed data on changes in expression for a large number of transcripts from activated Jurkit cells. Michael McClelland (Sidney Kimmel Cancer Center, La Jolla, California) discussed how changes in the "fingerprint" generated by differential display can be used to monitor changes in expression of a large number of genes following various treatments (with drugs or hormones, for example). J. Gregor Sutcliffe (Scripps Research Institute, La Jolla, California) described a method for uniquely tagging mRNA molecules to allow comparisons of different tissues or of the same tissue or cell line following different treatments.
Particularly important for cDNA library construction and RT-PCR analyses was a presentation by Wai-Choi Leung (Tulane University School of Medicine) on methods for studying the 3-D structure ("architecture") of mRNA molecules. His results showed that "rarity" of transcripts may sometimes be more an artifact of reverse-transcriptase inhibition by secondary mRNA structure than a true reflection of mRNA abundance.
Model Organisms. Understanding human gene function requires the development of surrogate genetic systems in model organisms. Appreciation is growing not only for the power of model organism systems in studying gene function but also for the role of comparative genomics in understanding the broader biological implications of data generated by the Human Genome Project. A number of workshop speakers addressed the role of model organisms in elucidating gene function.
Petra Ross-Macdonald (Yale University) described the generation of yeast strains harboring lacZ fusions that can be used to study intercellular localization and pattern of expression. In some cases, these experiments can suggest functions for previously uncharacterized genes and may give clues to the role of uncharacterized human genes with homologs in yeast.
Donna Albertson (University of California, San Francisco) discussed a method for visualizing the pattern of mRNA expression in whole animals. The method uses high-resolution FISH to study the expression pattern of uncharacterized genes in the nematodeCaenorhabditis elegans. This technique has been applied to 30 predicted genes of unknown function, only 4 of which failed to give a hybridization signal.
Melody Clark (Addenbrookes Hospital, Cambridge, U.K.) reviewed the advantages of using genomic sequencing of the Pufferfish (Fugu) for gene identification. Although the Fugu genome is about one-eighth the size of the human genome, it is thought to contain essentially the same complement of genes. Therefore, gene density is high in Fugu, introns are small, and genes are easier to identify.
Determining gene function, even in such experimentally tractable model organisms as the mouse, is not a trivial undertaking. Miles Brennan (NIH National Institute of Mental Health) described using Cre/lox, the site-specific recombination system of bacteriophage P1, to generate transgenic mice with precisely engineered deletions or duplications. This allows the simultaneous manipulation of ploidy of multiple genes and will be critical for understanding complex multigene disorders. Richard Woychik [Oak Ridge National Laboratory (ORNL)] expanded on the use of engineered deletions in transgenic mice by explaining a strategy for making point mutations in genes made hemizygous by the deletion. Such procedures hold promise for establishing models of various human diseases in mice.
Informatics. Two workshop sessions dealt with informatics and the role of computational science in gene discovery and analysis. Richard Mural (ORNL) discussed new enhancements to the GRAIL system, including improved sensitivity for identifying protein-coding exons from genomic DNA with high A+T content. He also described tools like BatchGRAIL that were designed to facilitate the analysis of large numbers of such single-pass sequences as might be generated in a cDNA or cosmid-skimming project. "Software trapping" can be an important method for gene identification when used in a judicious manner. Jean-Michel Claverie (CNRS, Marseille) stressed the importance of filtering out repetitive and other low-entropy sequences before using database-searching methods to identify relationships between sequences.
Philippe Bucher (ISREC, Lausanne) described a new approach that combines both computational and experimental methods to identify sequences bound by regulatory DNA-binding proteins. The approach, which has been applied to several eukaryotic transcription factors, provides a more reliable predictor of protein-binding sites than methods based on conventional multiple alignments. Kerstin Quandt (Institut fuer Saeugetiergenetik, Germany) discussed a new software package that finds correlations within promoter regions and between promoter elements and ORFs. The software XFACtoR provides a graphical user interface, allowing the user to view higher-order structures of promoter regions. James Fickett (Los Alamos National Laboratory) discussed a method for assigning function to newly identified genes by examining their regulatory context. Preliminary studies have focused on cataloguing the features characteristic of skeletal muscle specific enhancers and promoters. Being able to determine the time and place of gene expression would be a useful step toward understanding its function. Laurent Duret (Geneva University Hospital) presented data on the importance of 3' UTRs in the post-transcriptional processing of mRNAs. He indicated that 3' UTRs are more conserved than 5' UTRs and that, among vertebrates, 30% of genes have large (100- to 400-bp) parts of their UTRs conserved across 300 million years of evolution. These regions may play a role in translational control and messenger stability. The Gene Expression Database, a relational database for gene expression during mouse development, was described by Martin Ringwald (Jackson Laboratory). Included in this database will be a 3-D atlas of mouse development being assembled in Edinburgh. Currently, data are being entered into this database from existing literature and by electronic submission. Chris Fields (National Center for Genome Resources, Santa Fe, New Mexico) discussed modifications to the Genome Sequence Data Base schema that will facilitate the inclusion of gene-expression data. This restructuring will allow for direct querying of expression data and provide links to external expression databases.
The Sixth International Workshop on the Identification of Transcribed Sequences is planned for October 3-5, 1996, in Edinburgh, Scotland.
[Richard Mural, Oak Ridge National Laboratory (firstname.lastname@example.org) and Katheleen Gardiner, Eleanor Roosevelt Institute (email@example.com)]
A report on the third cDNA/EST mapping workshop that preceded this meeting appears in HUGO's Genome Digest [3(1),11-13 (January, 1996)]. Contact: Editorial office (+44-171/935-8085, Fax -8341)
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v7n5).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.