Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, October-December 1996; 8:(2)
The Sixth International Workshop on the Identification of Transcribed Sequences was held October 2-5, 1996, in Edinburgh, Scotland. The meeting attracted 46 speakers with 20 posters to discuss topics including the generation of regional and chromosomal transcriptional maps, functional analysis of gene expression, techniques for isolating and analyzing genes, use of model organisms, and informatics. The workshop was supported by the Cancer Research Campaign, European Commission, HUGO Europe, Lothian and Edinburgh Enterprise, Wellcome Trust, and DOE. Selected presentations are summarized below. (More details about the meeting are located at http://www.ornl.gov/meetings.)
Bioinformatics, Computational Biology
A number of speakers addressed bioinformatics and computational biology needs for transcriptional analysis. Characterization of potential regulatory elements in genomic DNA remains a difficult task.
Laurent Duret (Geneva University Hospital) described the use of large-scale comparative analysis of metazoan noncoding sequences to identify such elements. His study has found hundreds of long, highly conserved regions (HCRs) in noncoding parts of genes. HCRs retain at least 70% identity in sequences of 50 to 200 bases in DNA of species that diverged 300 million to 550 million years ago. Some of these sequences may play roles in gene regulation and mRNA localization. A database with more than 300 such sequences is available.
James W. Fickett (SmithKline Beecham) reported progress in recognizing transcriptional regulatory regions from their context within a DNA sequence. In particular, he has devised a system that can discriminate among myotubulin-specific regulatory regions, other regulatory regions, and nonregulatory regions. This is an important step toward being able to infer possible functions of a newly discovered gene from its DNA sequence.
Thomas Werner (GMBH Institut für Säugetiergenetik) presented an - approach to identifying transcriptional control regions. Using two types of retroviral control regions (LTRs) as models, he showed that this robust technique found all known LTRs in the Primate division of GenBank (95 Mb) and identified five previously unknown LTRs. The false-positive rate was reported to be quite low.
Richard Mural (Oak Ridge National Laboratory) commented on the challenge of automated annotation of DNA sequences. As the analysis of genomes moves into large-scale sequencing, identification and annotation of biologically relevant features in the sequence become increasingly complex and important. Annotation must be updated continually, particularly in light of the rapid rate of new data acquisition. Ideas were discussed for new systems to provide a user-defined view of a DNA sequence as well as data-mining tools for complex querying of multiple data resources.
Among mammals, the mouse is clearly the model organism of choice for "surrogate" human genetics, and information resources for mouse genetics and developmental biology are critical. Martin Ringwold (Jackson Laboratory) reported work on the Gene Expression Database. This database not only contains information on the expression of various mouse genes but also is being linked to a mouse-anatomy database that will allow the user to follow gene expression through the course of development.
The accumulating data from a number of other model organisms are providing new insights into genome structure and function. With nearly half of its 100-Mb genome sequenced, the nematode Caenorhabditis elegans is becoming increasingly important for gene discovery. Steven Jones (Sanger Centre) presented some results of gene prediction in the C. elegans genomic sequencing project. The project has identified about 9700 proteins, 46% of which clearly are related to proteins already in public sequence databases.
Because of its small genome size and the compact nature of its genes, the puffer fish Fugu rubripes is another important model organism. Greg Elgar (HGMP Resource Centre) described the Fugu Landmark Mapping Project, which aims to sample sequence 1000 Fugu cosmids to provide resources for a number of different applications, including gene identification. Some physical linkage data also are expected to come out of this project because of the likelihood of finding more than one Fugu gene per cosmid clone. Nearly 200 Fugu cosmids have been scanned.
Learning the patterns of gene expression is a necessary first step to understanding gene function and interaction. C. elegans is particularly well suited to studying gene-expression patterns - because the animal develops rapidly and the fates of all its cells have been mapped. Donna Albertson (Lawrence Berkeley National Laboratory) reviewed a preliminary study in which the expression pattern of nearly 200 C. elegans genes was examined using FISH on whole animals. Petra Ross-Macdonald (Yale University) described an approach for yeast that determines when a gene is expressed during the yeast life cycle, subcellular localization of the gene product, and the phenotypic effect of disrupting the gene. This technique is helping to determine functions of large numbers of yeast genes that have been identified by sequencing but have no relatives with known function in current databases.
One hope of comparative genomics is to use information from well-characterized model systems to provide candidates for genes implicated in human diseases. One such application was presented by Guiseppe Borsani (T‚l‚thon Institute of Genetics and Medicine), who found 66 human ESTs with significant homology to known Drosophila genes. All these genes, which are well characterized in Drosophila, are candidates for genes involved in human pathology. For example, an EST that was homologous to a gene causing retinal degeneration in the fruit fly was mapped to a human genome region near genes for three different types of human retinopathology.
Gene Identification and Mapping
A number of speakers presented data that begin to elucidate genome organization and function. Stephen Scherer (University of Toronto) described progress in gene identification on human chromosome 7q. Around 2500 genes are expected to be found on the long arm of chromosome 7. Three strategies for isolating and mapping these genes were discussed: (1) initial assignment of all known chromosome 7 genes and ESTs from the public domain to the map, (2) genomic DNA sequencing of selected chromosomal regions to identify genes, and (3) direct cDNA selection on chromosome-specific cosmids. The current 7q map contains over 1600 DNA markers, including 170 known genes, 200 ESTs, and more than 500 selected cDNA fragments.
Mammalian genomes are a mosaic of regions (isochores) of varying base composition. Katheleen Gardiner (Eleanor Roosevelt Institute) showed data on the isochore structure of human chromosome 21 and the nature of the boundaries between different isochores. Sequences at a number of these boundaries are homologous (>80% identity) to the pseudo-autosomal boundary of the sex chromosomes' short arms (as described for chromosome 6 isochore boundaries, Fukagawa et al.). One interesting feature of these sequences is that some appear to be transcribed.
CpG islands are short (1-kb) regions of genomic DNA with a high GC content and reduced methylation of C residues. About 60% of genes have these islands at their 5' ends, making CpG islands useful markers for transcriptional units. Sally Cross (Edinburgh University) discussed the construction of whole-genome CpG island libraries from human, mouse, and chicken. These libraries should be a valuable resource for isolating the 5' ends of a large number of genes, regardless of their level of expression.
Complexities of deducing mRNA structure from genomic sequences were described by Sherman Weissman (Yale University). Comparing full-length cDNAs to genomic sequences reveals a number of limitations in current methods using genomic sequence to predict the structure of mRNAs and proteins. One problem involves large introns that contain other transcribed sequences. Weissman also described a gene, B144, which has a 700-base mRNA that exists in at least 30 alternatively spliced forms.
Quantitative PCR is becoming an important technique for studying gene expression. Michael McClelland (Sidney Kimmel Cancer Center) addressed a broad range of issues connected to the effective use of quantitative PCR, particularly as it applies to differential display. These issues include relative quantitation by low-stringency PCR, the Cot effect, and problems of target vs standard titration. The Cot effect is particularly interesting because it demonstrates that low-abundance and high-abundance products accumulate at different rates. Very abundant products are formed more slowly than expected because product reannealing competes with priming. The need to control these various parameters was stressed in this presentation.
J.G. Sutcliffe (Scripps Research Institute) reported a form of differential display called TOGA (Total Gene Expression Analysis). TOGA uniquely identifies nearly every mRNA from an organism, including mRNAs not previously described, and does not require that the mRNA has been characterized previously. This automated PCR-based technique can detect messages of <0.001% prevalence, thus providing a powerful means for comparing mRNA expression profiles.
Wai-Choi Leung (Tulane University School of Medicine) described architectural elements of mRNA molecules. Energy maps can be constructed that describe the location, size, and energy density of closed regions of mRNA molecules. Closed regions reflect the secondary structure of mRNA that may be related to a number of processes, including RNA translocation, nuclear export, transcription termination, and translational control.
M. Bento Soares (Columbia University) discussed strategies for constructing cDNA libraries for both gene discovery and characterization. To clone genes represented by low-abundance transcripts, subtractive hybridization strategies are being developed to eliminate pools of sequenced cDNAs. In addition, techniques are being optimized to produce libraries enriched for full-length cDNAs. These libraries will be very useful for increasing gene representation and therefore the utility of dbEST.
Bernhard Korn (German Cancer Research Center) reported progress in constructing and gridding a full-length cDNA library from human fetal brain. His institution's current library has 120,000 clones with an average insert size of 1.8 kb. Some problems inherent in making such libraries were discussed.
The Y chromosome presents a number of unique problems to both genetic mapping and gene identification. Yun-Fai Chris Lau (University of California, San Francisco) presented two approaches to identifying Y chromosome specific genes by en masse terminal exon trapping. Analysis of these methods showed that such an approach is very feasible and>50% of exon clones were derived either from known Y genes or potential functional sequences.
Richard Mural, Oak Ridge National Laboratory (firstname.lastname@example.org)and Katheleen Gardiner, Eleanor Roosevelt Institute (email@example.com)
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v8n2).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.