S. Nalabolu(1), N. Baskaran(1), Y.C. Liu(1), Y.
Prashar(2), K. Prakash(2), and S. Weissman(1).
(1)Yale University School of Medicine, Dept. of Genetics, New Haven, CT, USA (2)Gene Logic Inc. Columbia, MD, USA
Comparison of full length cDNAs and genomic sequences in the human MHC shows both the value of having genomic sequences and certain limitations in present methods for using genomic sequences to predict the structure of mRNAs and proteins. Problems that arise in identifying and characterizing mRNAs based on genomic sequences include the expected examples of missing small exons and misinterpreting intronless genes. In addition, several curious situations have been noted including the presence of large introns containing multiple internal transcribed sequences, the presence of a gene, B144, that encodes an approximately 700 base pair mRNA which occurs in as many as thirty alternatively spliced forms, the use of a non-canonical 5' splice site, and the detection at the protein level of a product whose apparent molecular mass is about twice that predicted from the nucleotide sequence.
In other work, we continue to adapt modified PCR procedures for the production of nearly full length cDNAs from very small amounts of starting material, and to apply 3' end restriction fragment display methods to preformed cDNA libraries. The combination of these approaches makes it possible to compare patterns of gene expression from very small numbers of cells such as highly purified murine hematopoietic stem cells.
Haines L, McBride D and Clinton M
Roslin Institute, Edinburgh, Scotland
In mammals, sex depends upon the inheritance of a heteromorphic pair of sex chromosomes, X and Y, with the dominant Y chromosome in males leading to the formation of testis. However, this mechanism of sex determination can not operate in all vertebrates, for example, in most avian species the female is the heterogametic sex (ZW) and the male the homogametic sex (ZZ). At present, it is unclear whether ovary development is the result of a dominant W chromosome or that the testis develop under a Z chromosome dosage mechanism. The gonadal primordia, the genital ridges appear in the chick embryo at day 4 of development and superficial morphological differences are apparent between males and females by day 8. Sex determination is thought to occur around day 6 of development.
In an attempt to identify genes involved in avian gonadal development, we have used Differential Display to compare the patterns of gene expression in the developing ovary and testis. Genital ridge/gonads were excised from male and female chick embryos between days 4 and 9 of development and RNA extracted from these tissues was used to perform differential display analyses. Banding patterns obtained from male and female samples were compared and bands exhibiting sexually dimorphic patterns or intensity changes during development were identified. Gel fragments corresponding to these bands were excised, and the DNA eluted, re-amplified and cloned. Here we report the results obtained from five such clones. In order to confirm the expression patterns demonstrated by differential display analysis, Northern hybridization was performed using these clones as probes. For two of these clones, no signal was detectable by Northern hybridization while the remaining three clones demonstrated patterns of expression which replicated the patterns seen by differential display. Of these transcripts, three represent novel sequences, one represents a transcript with a well documented developmentally regulated pattern of expression and one represents the chick homologue of a known signal transduction molecule considered to be involved in cell morphology changes.
Miles B. Brennan(1), Brian Sauer(2), and Ute
(1)Eleanor Roosevelt Institute, Denver, CO, USA; (2)Laboratory of Biochemistry and Metabolism, NIDDK, Bethesda, MD, USA (3)Unit on Molecular Genetics, NIMH, Bethesda, MD, USA
The systematic identification of transcribed sequences has yielded many intriguing novel genes; their analysis presents a new challenge. One approach to the functional analysis of novel genes is the introduction of knockout (null) alleles into the mouse germline and the testing of the resulting mutants for phenotypes. While it is impractical to knockout each gene individually, engineering large segmental deletions spanning defined chromosomal regions would allow the phenotypic testing of null alleles in a large number of genes simultaneously. We are using the Cre/lox recombination system to introduce such defined deletions into the mouse germline.
We introduced lox sites at both the HPRT locus (X chromosome) and the POMC (proopiomelanocortin) locus (chromosome 12) by homologous recombination in embryonic stem (ES) cells. ES cell lines with each disruption have successfully colonized the mouse germline. We are now introducing new lox sites at sequences on chromosome 12 and on the X chromosome at specific distances from HPRT or POMC loci. After introducing these into the mouse germline, recombination will be catalyzed by the Cre recombinase expressed specifically in one cell embryos by the Ad5Ella promoter.
Analyzing knockouts of contiguous genes simultaneously will facilitate the identification of genes producing experimentally tractable phenotypes, especially extreme and specific effects on development or in adult function. The systematic character of this mapping offers important advantages over random insertional mutagenesis and enhancer/promoter trap strategies.
Medical Biochemistry Department, Geneva University Hospital, Geneva, Switzerland
Genome sequencing projects currently produce hundreds of Mb per year. The major task of bio-informaticians now is to try to identify functional elements within this huge amount of genomic sequences. Some functional elements (such as tRNA genes, protein-coding regions, etc.) can be identified by using sets of rules or statistical methods. These methods are extremely useful, but their prediction power is often well under 100%, and they are limited to a small set of well known functional elements. Another possibility to search for functional elements is to use comparative analysis: if a genomic region has remained highly conserved during evolution, then it means that it is subject to a strong selective pressure and hence that it is functional. This method has some limitations: it requires to have genomic sequences from different taxa, not too closely and not too distantly related, and some functional elements cannot be identified by comparative analysis. However, this approach proves to be very efficient, and its main advantage is that it does not require any prior knowledge of the functional elements to be searched.
We have used this method in a large scale project of identification of regulatory elements within non-coding parts of protein genes. 145 Mb of non-coding or non-annotated sequences from different metazoan taxa (essentially vertebrates, insects and nematodes) were extracted from databases and compared between each other using a combination of BLAST and LFASTA, to search for evolutionary conserved elements. This study revealed the existence of hundreds of very long highly conserved regions (HCRs) in non-coding parts of genes : 70% identity or more over 50 to 2000 nt between species that diverged 300 to 550 million years ago. Such a conservation is unexpected because it concerns fragments that are much longer than the regulatory elements known to date. Studying HCRs distribution within genes showed that functional constraints are generally much stronger in 3'-non-coding regions than in promoters or introns. The 3'-HCRs are particularly A+T-rich and are always located in the transcribed untranslated regions of genes, which suggest that they are involved in post-transcriptional processes (mRNA export, localization, translation, or degradation). Various insights suggest that many of these 3'HCRs play an important role in mRNA intracytoplasmic localization.
Information on HCRs (sequences, alignments, annotations, bibliographic references) are compiled in a database (ACUTS: database of Ancient Conserved UnTranslated Sequences) that will be made available to the scientific community.We now analyse all these conserved elements to try to detect some common features that could be used to set up rules for the prediction of regulatory elements in new sequences.
Acknowledgements: This work was done using computer facilities provided by the Geneva Biomedical Research Institute (Glaxo Wellcome) and the company CHEMPUTEAM SA (Geneva). L.D. is recipient of a FEBS Fellowship.
Esther E. Schmidt, Koichi Ichimura and V. Peter Collins
Ludwig Institute for Cancer Research and Institute for Oncology and Pathology, Karolinska Hospital, Stockholm, Sweden
Differential display was used as an approach to identify genes that play a role in the tumorigenesis of human astrocytic tumours. Glioblastoma is the commonest and most malignant form of these tumours. RNA was extracted from clearly separated areas within the same tumour specimen, containing tumour cells and normal cells, respectively, and was subjected to differential display. Among 10 differentially displayed fragments successfully cloned into a plasmid vector, a 151 bp fragment isolated from a glioblastoma patient detected a 6 kb transcript on a Northern blot which was present in the tumour cells only. It was expressed in an additional 9/18 glioblastomas but not in normal brain tissues. Based on the 151 bp sequence, a 1.9 kb cDNA fragment was isolated from adaptor-linked U118MG cDNA using an approach combining 5 RACE and long-distance PCR (MarathonTM cDNA Amplification, Clontech). The sequence identified thus far represents a novel sequence with no significant homology to any known sequence. The cloning and sequencing of the full-length cDNA is ongoing. This strategy of searching putative oncogenes or tumour suppressor genes opens up new possibilities for understanding the processes involved in oncogenesis of human tissues.
J.N. Feder, D. A. Ruddy, V. K. Lee, G. S. Kronmal, G.
A. Mintier, A. Fullan, F. A. Mapa, N. C. Meyer, A.
Basava, L. Quintana, E. McClelland, R. Domingo, D. B.
Loeb, W. Thomas, Z. Tsuchihashi, R. C. Schatzman and
R. K. Wolff
Mercator Genetics, Menlo Park, CA, USA
In the process of positionally cloning a candidate gene responsible for hereditary hemochromatosis (HH) (Feder et al, (1996) Nature Genetics 13, 399-408), we constructed a 1 megabase transcription map in a region 4 megabases distal to the HLA-A locus and several novel gene families were identified. A combination of direct cDNA selection, exon-trapping and genomic-sample sequencing were used to isolate 150 expressed sequence fragments which were ordered into an EST content map. Probes from each of the EST contigs were used to screen cDNA libraries. Besides the novel MHC class 1-like HH candidate gene, HLA-H, a family of five butyrophilin related sequences termed BTF1-5 were found. A gene with strong homology to the 52kd Ro/SSA lupus and Sjogren's syndrome autoantigen which we called RoRet was also identified. Both, the BTF family and the RoRet genes share a exon of common evolutionary origin called B30-2 (Vernet et al., (1993) J. Mol. Evol. 37, 600-612) . This exon, which was originally isolated from the HLA class 1 region has "shuffled" into several genes along the chromosome distal to the MHC. Here, we provide more examples. Also identified in the contig were two genes with predicted structural similarity to a previously cloned type 1 sodium phosphate transport gene (Chong, S. S., et al., (1993) Genomics 18, 355-359). As a further measure to isolate all of the genes in the HH candidate region, we completely sequenced 250 kb of genomic DNA and cluster histone genes were identified. The region around the HH locus therefore appears to be rich in gene families and duplicated sequences.
Catherine Nguyen(1), Samuel Granjeaud(1), Magali
Gualandi(2), Philippe Naquet(2) and Bertrand Jordan(1)
(1)Genome Structure and Immune Functions Group, (2)Lympho-stromal Cell Interactions During Terminal T Lymphocyte Differentiation Group, CIML Luminy, Marseille, France
The CD3-zeta chain of TCR-CD3 complex plays a pivotal role in the activation of T cell responses and in the selection of the T cell repertoire. In zeta knock out mice, the T cells have a profound reduction in the surface levels of TCR-CD3 complexes and these animals have poorly developed thymuses.
In order to find new genes differentially expressed between normal and zeta knock out thymus, we set up two hybridizations on the same set of 3,072 genes with complex probes made from total RNA of zeta knock out or wild type thymus. After quantitative measurement of the amount of hybridized probe on each colony, the intensity ratios zeta-ko/wild type are calculated for each. 171 cDNA clones were selected showing a significant stimulation or a repression in zeta-ko thymus. Additional hybridizations performed with complex probes made from RNA of different cell types (such as macrophage, thymoctye, epithelial cell line under different stimulation) or tissues (lymph node, spleen...) allow to precise the selection. The 38 clones representing the most significant profile were tag sequenced (HGMP Resource Centre, Hinxton). Among these, 5 are highly homologous to known genes, 27 are new or related to mouse EST or human EST. "new" clones are tested on Northerns to determine the size of the mRNA. Three of them have been analysed by tissue in situ hybridization and show selective transcription in certain cell types.
Bittner, M.L.(1), DeRisi, J.(2), Meltzer P.S.(1),
Becker, K.G.(1), Penland, L.(2), Ray, M.(1), Su, Y.(1),
Brown, P.(2), and Trent, J.M.(1)
(1)Laboratory of Cancer Genetics, National Center for Human Genome, Research, NIH, Bethesda, MD, USA (2)Department of Biochemistry, Stanford University Medical Center, Stanford, CA, USA
(The NCHGR/Stanford groups contributed equally to this work.)
cDNA microarrays provide a powerful tool for the study of gene expression patterns associated with complex biological phenomena. The cDNA microarray system is based upon the robotic printing of cDNAs on glass slides, and simultaneous two-color fluorescence hybridization (Schena et al., Science 270:467). A high density array of 1,161 DNA sequences was used to search for differences in gene expression associated with tumor suppression in a melanoma cell line (UACC-903) and its chromosome-6 suppressed subline [UACC-903 (+6)] (Trent et al., Science 247:568). The microarray contained cDNAs from two principal sources: 1) 674 genes derived from the Unigene set of the 1NIB normalized cDNA library; and 2) 183 genes resulting from reciprocal subtractive hybridizations between cDNAs from the tumorigenic UACC-903 cell line and its suppressed derivative. Fluorescent probes were generated by reverse transcription of mRNA from UACC- 903 cell line and UACC-903 (+6). A second experiment underway will examine the expression patterns of a large set (approx. 150) of zinc finger genes derived from human brain. These genes are being analyzed for their relative developmental and tissue specific expression patterns. Results of these studies suggest the enormous potential for applying this approach to the study of human gene expression.
Richard J. Mural
Biology Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA E-mail: firstname.lastname@example.org
As the analysis of genomes moves into the large-scale sequencing phase, the identification and annotation of biologically important features in anonymous DNA sequences become an even larger challenge. There are currently no systems capable of providing uniform high quality annotation of the estimated 2 million bases of DNA sequence which, in the near future, will be generated daily. Nor does there seem to be any consensus of what should be included in the annotation of genomic sequence. Issues of data ownership, third party annotation and database curation further complicate the resolution of this problem. In addition to timely annotation it is imperative that annotation be continually updated particularly in light of the rapid rate of new data acquisition.
Some of the issues involved in addressing these problems and some of the tools that are being developed to provide timely analysis of new sequences will be discussed. Systems for providing a user a personalized view of DNA sequence data will be discussed as will tools for data-mining and complex querying of multiple data resources. Examples of the kinds of information which can be obtained from analysis of large regions of uncharacterized DNA sequence will be illustrated.
(Supported by the Office of Health and Environmental Research, United States Department of Energy, under contract DE-AC05-84OR21400 with Martin Marietta Energy Systems, Inc.)