M. Ringwald(1), R. Baldock(2), J. Bard(3), D.
Begley(1), G. Davis(1), D. Davidson(2), J.T. Eppig(1),
M. Kaufman(3), J. Richardson(1), and L. Taylor(1).
(1)The Jackson Laboratory, Bar Harbor, ME,USA; (2)Human Genetics Unit, Edinburgh, Scotland; (3)Edinburgh University, Edinburgh, Scotland
As genome research shifts from the identification of genes to understanding their function, the amount of expression data, and the scale in which these data are generated grow at an enormous pace. Expression data are complex. Different types of expression assays give different but complementary insights into expression patterns, and new methods for measuring gene expression are developed rapidly. We urgently need tools that enable integrated storage and analysis of gene expression information.
We are developing a database of gene expression information for the laboratory mouse. The database is designed to store primary data from the different expression assays. In this format, the data can be integrated, new data and new assay types can be added, and novel insights resulting from new data can be represented. The system should enable the community to put bits and pieces of the various types of expression data together, and thus to successively gain knowledge about what products are made from a given gene, and when and where these products are expressed. Expression patterns are described using a standardized anatomical dictionary that models the anatomy hierarchically (from body region to tissue to tissue substructure) to allow for continuous refinement of the nomenclature system, to fit with the different resolution of analysis methods, and to facilitate user annotation of expression data. For in situ studies, the textual annotations are complemented with digitized images of original expression data. This database system will be combined with a 3D atlas of mouse development to enable 3D graphical display and analysis of expression patterns. Integration with the Mouse Genome Database and interconnection with other databases (e.g. sequence databases, databases for other species) will place the gene expression data into the appropriate biological and analytical context.
We have implemented a Gene Expression Database prototype in the Sybase relational database system and have built interfaces that allow detailed standardized descriptions of expression data, including indexing and interactive labeling of 2D images, and electronic submission of these data from research laboratories. The submission system provides on-line links to other databases and electronic validation tools to facilitate data annotations and cross-referencing. Prototype interfaces are refined based on our experience in annotating expression data from the literature and on feedback from test laboratories, with the aim to develop a user friendly system for the community at large.
An index of expression citations from newly published research reports documenting data on endogenous gene expression during mouse development has been established. The index is updated daily and includes reference(s), gene(s) and embryonic stage(s) analyzed and expression assay(s) used. A searchable version of the Gene Expression Index is available at http://www.informatics.jax.org/gxd.html, or from our mirror site in the UK at http://mgd.hgmp.mrc.ac.uk/gxd.html.
Maria DFS Barbosa(1), Velizar T. Tchernev(1), Jennifer
A. Ashley(2), John C. Detter1,
Quan A. Nguyen(1), Dipti Chotai(3), Charles Hodgman(3),
Roberto CE Solari(3), Stephen J. Brandt(4), and Stephen
(1)Dept of Med and Center for Mammalian Genetics, Univ of Florida, Gainesville, FL, USA (2)Dept of Biochemistry, Univ of Texas Southwestern Medical Center, Dallas, TX, USA (3)Cell Biology Unit, and Advanced Technologies and Informatics Unit, Glaxo Wellcome Medicines Research Center, Hertfordshire, UK (4)Department of Medicine, Vanderbilt University, Nashville, TN, USA
Vesicular transport to and from the lysosome and late endosome is defective in patients with Chediak-Higashi syndrome (CHS) and beige mice (bg). CHS and bg cells have giant perinuclear vesicles with characteristics of late endosomes and lysosomes that arise from dysregulated homotypic fusion. CHS and bg cells also exhibit compartmental missorting of proteins, such as elastase and cathepsin G. We have used a positional cloning approach to identify the gene that is mutated in bg mice: 1. Using intersubspecific backcross mice, bg was localized to a 0.24 cM interval on mouse Chromosome 13. 2. A physical map of 2400 kb of the bg nonrecombinant interval was generated by STS content mapping of YAC clones. 3. A candidate gene for bg, designated Lyst (LYSosomal Trafficking regulator) was identified by direct selection from a nonrecombinant interval YAC clone. 4. Lyst was disrupted by a 5kb deletion in bg<11J> mice, and Lyst mRNA was markedly reduced in bg<2J> homozygotes. Lyst encodes a novel, potentially prenylated protein with sequence similarity to stathmin, a phosphoprotein involved in intracellular transport through regulation of microtubule polymerization. The homologous human gene, LYST, is highly conserved with mouse Lyst, maps within the nonrecombinant interval for Chediak-Higashi syndrome on human Chromosome 1q42-q43, and contains a frame-shift mutation at nucleotides 117-118 of the coding domain in several unrelated CHS patients. Thus bg mice and human CHS patients have homologous disorders associated with Lyst mutations.
MRC Laboratory of Molecular Biology, Cambridge, United Kingdom Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Characterization of the expressed sequences predicted from the genome sequence assembled by the Caenorhabditis elegans Genome Sequencing Project is being accomplished using high resolution fluorescent in situ hybridization (FISH). In C. elegans, most somatic cells can be identified in the light microscope throughout development, allowing mRNA distributions to be visualized in individual cells in whole animals. To date post-embryonic expression patterns have been obtained for more than 90% of the approximately 200 genes tested from a sample of four contigs. Expression of the majority of the genes was seen in all cells, but only rarely were the genes expressed uniformly throughout the animal. Instead, for most of the genes, there were distinct, gene-specific patterns in the intensity of hybridization to different cells or tissues, as well as changes in expression level during development. A small proportion of genes were expressed only in specific cells or tissues. For almost all genes, transcription levels decreased with developmental age, dropping most dramatically after the molt from the fourth larval stage to adulthood. A comparison of one autosomal contig consisting of 70 genes with one from the X chromosome (65 genes) revealed little difference in gene expression patterns between the two contigs with the exception that a significantly greater number of the X-linked genes were expressed solely in muscle. Expression patterns are available in the C. elegans database, ACeDB.
Department of Molecular Biology, Massachusetts General Hospital, Department of Genetics, Harvard Medical School, Boston, MA, USA
Fundamental cellular functions are controlled by complex networks of interacting proteins. We and others have been using interaction mating two hybrid approaches to chart these networks and determine the function of their members.
Many current approaches to the study of proteins in these networks involve overexpression or inhibition of individual protein members. We have been exploring a complementary approach, in which we use artificial proteins to disrupt the connections between these members.
Basing our design on complementarity-determining regions of immunoglobulins, we constructed a combinatorial library of constrained 20-residue peptides displayed by the active-site loop of E.coli thioredoxin. We used a two-hybrid system to isolate those that bind human cyclin-dependent kinase 2 (Cdk2). These peptide aptamers recognize different epitopes of Cdk2 with equilibrium dissociation constants in the nanomolar range. Those tested inhibit Cdk2 kinase activity by disrupting its interaction with a protein substrate. We are exploring their use in vivo to disrupt specific protein interactions.
In other experiments, we sought to construct another tool for probing protein networks that could inactivate network members and their associated proteins. To this end, we fused to the aptamers a hect domain which carries a ubiquitin-ligase activity. These fusion proteins still recognize Cdk2, albeit at much lower affinity. When co-expressed with Cdk2 in yeast, they trigger specific ubiquitylation of their target. We hope these molecules will lead to the inactivation or destruction of targeted proteins.
These results show that peptide aptamers constitute a novel kind of recognition molecule with some clear advantages above monoclonal antibodies. Peptide aptamers are designed to be expressed inside cells where they recognize specifically a protein target. They are isolated together with their coding genes. Their small size should allow them to disrupt specific protein interactions while not affecting others. Furthermore, our experiments suggest that it is possible to fuse to these "recognition modules" various "effector modules" to construct more sophisticated molecular machines. We imagine that peptide aptamers will contribute to the development of an intracellular nanotechnology aiming at destroying, modifying, moving, and assembling protein targets inside cells.
Anthony J. Brookes
Medical Research Council, Human Genetics Unit, Edinburgh, Scotland
Current large scale efforts both to sequence various model genomes, and to isolate, sequence and map random cDNAs (as ESTs) from many tissues tends to give one the impression that the task of gene identification in the human will be fully complete in a matter of months or at most a year or so. While some might argue that this is essentially true, there are other that question just how far towards a truly complete genome transcription map these efforts will really take us in advance of the human genome itself being fully sequenced. The reason behind this questioning concerns the unknown number of genes that may be missed by the EST initiative due to properties such as very low or very time or tissue restricted patterns of expression. The possibility that this number is large is strongly promoted by the observation that under one third of the 'unbiased' genes identified within the completed sequence of the s. cerevisiae genome are represented by homologues in current dbEST and other databases. In light of this there may still today be a valid role for the direct 'genomic' cloning of human genes, since this approach requires no prior assumptions about patterns or levels of expression, or of genomic location.
We have now used this approach in a search for novel genes for two gene 'families' of interest to us, namely i) synuclein genes, and ii) genes of oxidative phosphorylation. Employing a combination of degenerate PCR and library screening approaches, both within and between species we have found that several human genes of interest to us do exist but that these are not yet represented in any public databases. Our experiences with these gene finding exercises shall be described and related to similar observations made during various focussed positional cloning exercises. The general conclusion we draw is that many genes are being missed by the EST initiative as it is currently being performed.
Max Planck Institute for Molekular Genetics, Berlin, Germany
Exon-trapping and cDNA selection techniques have been applied to chromosome 21 specific genomic clones representing about 30 megabases of random DNA. Integration of the putative new transcripts with the physical map shows a non-random gene distribution with a major clustering towards the telomere. Chromosome 21 is estimated to encode for 500-800 genes, and more than 700 independent gene fragments have been identified so far. This approach provide a way to assess the feasibility of constructing chromosome-scale transcript maps by exon-trapping and cDNA selection. If combining both techniques partly circumvent inherent biases, assembling short gene fragments into "bona-fide" discrete genes remains a challenge. The status of the actual transcript map will be discussed, together with approaches used for converting the scarce transcribed elements into genes. Reflections onto gene organization will be evoked.
Patrick Onyango, Barbora Lubyova, Paola Gardellin,
Robert Kurzbauer and Andreas Weith
Research Institute of Molecular Pathology, Vienna, Austria
The chromosome 1p36 region recurrently reveal non random allelic deletion in a number of cancer genomes. Examples of the cancer types affected include; neuroblastoma (NB), primitive neuroecotodermal tumour (PNET), malignant melanoma, wilms tumour, ductal breast cancer, mackerel cell carcinoma, T-cell leukaemia, small cell lung carcinoma and hepatocellular carcinoma (HCC). The deleted regions are thought to contain genes essential in tumour suppression. Our focus has been to isolate the presumptive neuroblastoma, PNET and HCC suppressor genes from the 1p36 locus.
The isolation of tumour preventing elements using positional cloning approaches can be problematic when such genes are masked within large deletions. To circumvent some of these problems we first aimed at the establishment of an integral map of the region. The generation of the integral map involved creating a framework composed of a fluorescence in situ hybridization (FISH), a pulsed-field gel electrophoresis (PFGE), a cytogenetic and a genetic map consisting of YACs, P1s, PACs, and cosmids. Secondly, we determined the minimally deleted region in NB. This was achieved through FISH characterization of two small 1p36 interstitial deletions from a constitutional genotype of a NB patient, and an NB cell line. Thus, we defined a NB consensus deletion, which is roughly 3Mb in size. Incidentally, two balanced translocations found in a NB and in a PNET cell line mapped to the NB consenus deletion. We could further show that the previously proposed NB candidate suppressor genes like Id3, TNFRI, PAX7, DAN1, CDC2L all map outside the newly defined consensus deletion.
Employing exon trapping, cDNA selection, Zoo blot, and limited sequencing we have isolated 7 new genes within the NB consensus deletion and also mapped 3 previously known genes to the same locus. These genes were treated as possible candidate tumour suppressor genes for NB and PNET since they were isolated from clones mapping in the vicinity of these two translocations. Previous loss of heterozygosity studies have also shown that HCC deletions overlap with our NB consensus deletion, and hence this tumour was included in our subsequent analysis. Full length cDNAs corresponding to these genes were isolated. Transcript characterization of the cDNAs will be presented. Briefly, sequence search of the public domain data bases using the blast software showed some interesting features for some of the genes. For example one of the genes was homologous to a tyrosine phosphatase, and another to a cell cycle checkpoint protein. To assess the possible role of these genes in the NB, PNET and HCC we analysed the integrity of these genes in the respective tumours, using FISH, southern and northern blots. A total of 18 primary NB tumours, 11 NB cell lines and 9 HCC primary tumours were analysed. None of the genes seemed to be rearranged in NB either at the genomic or at the transcript level. However, these results do not exclude the presence of subtle mutations like single base rearrangements. Analysis of the HCC tumours on the other hand revealed deletion and insertions in the cell cycle check point gene. Suggesting that the gene might have been functionally affected in the HCC tumours. The genes which showed no rearrangements in the tumours we tested may still have roles in other cancer types which frequently show rearrangements of the chromosomal 1p36 locus.
Marie-Josephe Pebusque(1), Alexandra Imbert(1),
Francoise Ugolini(1), Jose Adelaide(2), Max
Chaffanet(1,2), Amel Dib(1), Cornel Popovici(1), Remi
Houlgatte(3), Charles Auffray(3) and Daniel
(1)Laboratory of Molecular Oncology, U119 INSERM, Marseille, France; (2)Laboratory of Tumor Biology, Institut Paoli-Calmettes, Marseille, France; (3)Genexpress, Villejuif, France.
The region 8p11-p21 of the short arm of human chromosome 8 is involved in several pathologies such as malignant tumors harbouring deletions, genomic amplifications or translocations, and the Werner syndrome. We have recently reported detailed physical maps of this region based on series of yeast artificial chromosomes (YACs) (Dibet al., 1995; Chaffanet et al., 1996; Imbert et al.,1996).
A high resolution transcript map is now in progress. It is obtained by a combination of several strategies which include genomic sequencing of Island Rescue PCR products, exon trapping, direct cDNA selection and mapping of chromosome 8 expressed sequence tags (ESTs). A combination of exon amplification and cDNA selection methods are applied to cosmid libraries constructed from representative YACs. Putative transcribed sequences obtained are cloned in plasmid vectors. Characterization of unique clones is performed by mapping to the corresponding genomic region, sequencing and DNA sequence analysis. For selected clones, expression studies and cDNA libraries screening are performed. In addition, expressed sequences isolated from systematic screening of human cDNA libraries (Auffray et al., 1995), and previously assigned to chromosome 8, are precisely mapped to the region of interest. The results from these studies should provide candidate genes for 8p11-p21 associated diseases.
Auffray et al. C.R. Acad. Sci. (Paris), 318: 263-272,
Dib et al., Oncogene, 10:995-1001, 1995.
Chaffanet et al., Cytogenet. Cell Genet. 72:63-68, 1995.
Imbert et al., Genomics, 32: 29-38, 1996.
Jeffrey W. Touchman(1), Gerard G. Bouffard(1), Luping
Wang(2), Lauren A. Weintraub(1), Jacquelyn R. Idol(1),
Jesse C. Nussbaum(1), Christiane M. Robbins(1), Michael
Lovett(2), and Eric D. Green(1)
(1)National Center for Human Genome Research, National Institutes of Health, USA; (2)The University of Texas Southwestern Medical Center, USA
The establishment and mapping of gene-specific DNA sequences are highly complementary to the ongoing efforts to map and sequence all human chromosomes. To facilitate our studies of human chromosome 7, we have generated and analyzed 2,006 expressed-sequence tags (ESTs) derived from a collection of direct selected cDNA libraries highly enriched for human chromosome 7 genes. Similarity searches indicate that approximately two-thirds of the ESTs are not represented by sequences in the public databases, including those in dbEST, and thus represent new gene sequences. In addition, a large fraction do not have redundant or overlapping sequences within our collection. PCR assays (i.e., STSs) have been developed for 190 of these ESTs. Remarkably, 181 out of 190 (96%) of these STSs mapped to chromosome 7, demonstrating the robustness of the chromosome enrichment in the construction of the direct cDNA selection libraries. Thus far, 140 of these EST-specific STSs have been unequivocally assigned to YAC contigs from different parts of the chromosome. Together, these studies provide over 2,000 ESTs highly enriched for chromosome 7 gene sequences, 181 new chromosome 7 STSs corresponding to ESTs, and a definitive demonstration of the ability to enrich for chromosome-specific gene sequences by direct cDNA selection. Furthermore, the libraries, sequence data, and mapping information generated to date should greatly enhance construction of the chromosome 7 transcript map.
Yale University, New Haven, CT, USA
Now that the genome of S. cerevisiae has been sequenced, the focus of research has moved towards investigating the function of the encoded proteins. In a pilot project, we used transposon mutagenesis to create a bank of yeast strains, each with a lacZ insertion at a random genomic location. The bank is the basis of a 3-in-1 approach for determining when genes are expressed during the life cycle, the subcellular locations of their encoded proteins, and the phenotypic effect of disrupting the gene (Genes Dev. 8:1087-1105, 1994). We identified 3,600 strains with fusions expressed during vegetative growth, mating or meiosis. The fusion protein localizes to a discrete site in 12% of these strains. Phenotypes were analysed for 186 insertions, and the disrupted gene identified in 263 strains. In an extension of this work, we have used preparations of spread nuclei to examine strains with lacZ-fusion proteins that localize to the nucleus. The fusion protein can be immunolocalized to discrete sites on chromosomes in 20 of 31 strains examined. Of 9 strains carrying fusions in known genes, 5 are in known transcription factors and DNA binding proteins.
We have now initiated a full-scale effort to analyze all ORFs in the yeast genome by this approach. This project utilizes a new library and novel adaptations of large-scale techniques pioneered in genome projects. To improve our ability to immunolocalize the products of the mutagenized genes, the transposon used (mTn-3xHA/lacZ) has been extensively modified. mTn-3xHA/lacZ contains the lacZ gene, allowing us to identify in-frame fusions as before. However, a lox site is present near each end of the transposon. Upon expression of the Cre recombinase, all sequence elements between the lox sites are excised, leaving a single lox site and flanking transposon sequences. We have engineered these flanking sequences to contain three tandem copies of the 'HA' epitope from influenza virus hemagglutinin. Thus the excision event generates an in-frame insertion of 93 amino acids (a 'HAT' tag) in the product of the mutagenized gene.
mTn-3xHA/lacZ has performed very well in test-mutagenesis of two yeast genes. The HAT-tagged proteins were functional and localized correctly with the majority of insertions. We are further modifying the transposon to incorporate selection markers allowing its use in other eukaryote systems, and have also created a version allowing generation of in-frame fusions to the green fluorescent protein.
In summary, our approach will provide insight into the function of both novel and previously characterized yeast gene products. The results of our new project will be deposited in a publicly accessible data base (http://ycmi.med.yale.edu/YGAC/home.html), allowing researchers who identify a yeast gene (or a homolog thereof) to know when that gene is expressed, whether its gene product localizes to a specific subcellular location, and the phenotype of the insertion mutant.