6th International Workshop
on the Identification of Transcribed Sequences

October 3-5, 1996 Edinburgh, Scotland

SESSION 5


Return to IWITS 1996 Homepage

  1. Novel characteristics of the 3q21 leukemia breakpoint region

    A. Rynditch(1,2), Y. Pekarsky(2), and K. Gardiner(2)
    (1)National Academy of Sciences of Ukraine, Kiev, Ukraine (2)Eleanor Roosevelt Institute, Denver, CO, USA

    Human chromosome band 3q21 is frequently involved in rearrangements associated with leukemia. Rearrangements include deletion of 3q21, inversion, insertion and translocation of 3q21:3q26, and translocation of 3q21 to 1p35, 9q22 and 11q23. Cloning and characterization of the major 3q21 breakpoint region has revealed several novel features:

    i) high novel gene density: Exon trapping, cDNA selection and genomic sequencing followed by Grail analysis have so far identified fragments of 10 genes within an approximately 100 kb segment. This is the highest demonstrable gene density associated with any leukemia breakpoint region. With the exception of the Ribophorin I gene, all cDNAs are novel.

    ii) low transcriptional activity: With one exception, the novel genes are transcribed at relatively low levels. Transcription of 5 genes is detectable only by RT-PCR; only 1 of 5 tested is present in a cDNA library.

    iii) dispersion of breakpoints: Positions of 10 leukemia breakpoints, including t(3;3), inv3 and (1;3), have been correlated with the transcriptional map. Breakpoints are distributed throughout a 35 kb segment spanning 3 genes, suggesting that, contrary to other AML cases, it is unlikely that a single fusion protein can be involved.

    iv) altered transcriptional activity: Preliminary data from a leukemia-derived cell line carrying a t(3;3) indicates increased expression of several genes mapping in the vicinity of the breakpoint. It is postulated that the rearrangement produces a generalized deregulation or destabilization of transcription.

  2. Drosophila-related expressed sequences (DRES): A source of candidate genes for human diseases

    G. Borsani(1), S. Banfi(1), E. Rossi(2,3), A. Bulfone(1), L. Bernard(1), F. Rubboli(1), A. Marchitiello(1), A. Guffanti(1), G. Simon(1), S. Giglio(3), E. Coluccia(3), M. Zollo(1), O. Zuffardi(2,3), A. Ballabio(1,4)
    (1)Telethon Institute of Genetics and Medicine (TIGEM) (2)Servizio di Citogenetica, San Raffaele Biomedical Science Park, Milan, Italy; (3)Cattedra di Biologia Generale, University of Pavia, Pavia, Italy; (4)Department of Molecular Biology, University of Siena, Siena, Italy.

    Cross-species comparison is an effective tool used to identify genes and study their function in both normal and pathological conditions. We have applied the power of Drosophila genetics to the vast resource of human cDNAs represented in the EST database (dbEST) to identify novel human genes of high biological interest. Sixty-six human cDNAs showing significant homology to Drosophila mutant genes were identified by screening dbEST using the text string option, and their map position was determined using both fluorescence in situ hybridization (FISH) and radiation hybrid mapping. These cDNAs, which were termed DRES (Drosophila-related expressed sequences), represent positional candidate genes for human diseases mapping to the corresponding genomic region. For example, DRES9 is homologous to the Drosophila retinal degeneration B gene and was mapped to 11q13.5, where at least three types of human retinopathies were assigned. We are now focusing on identification and mapping of additional DRES, isolation and sequencing of full-length transcripts, identification of yeast artificial chromosomes (YACs), isolation of murine homologs of DRES, genetic mapping in the mouse, and expression studies in mouse embryo by RNA in situ hybridization experiments. Comparison between DRES genes and their putative partners in Drosophila and mouse may provide important insights into their function in mammals and their possible role in human disease.

  3. En masse terminal exon trapping of the human Y chromosome

    Yun-Fai Chris Lau
    University of California-San Francisco, Division of Cell and Developmental Genetics, San Francisco, CA, USA

    The identification of ESTs from the human Y chromosome is a difficult task because most genes on this chromosome are postulated to be either specific for male physiology and development or members of polygenic traits whose expression are unknown. So far, random sequencing efforts have not been fruitful in generating ESTs from this chromosome. As a first step to collect ESTs from the human Y chromosome for the construction of a transcription map, we have applied the 3 terminal exon trapping technique to identify ESTs en masse from this chromosome. Two approaches have been adopted for this study. First, total DNA derived from an entire Y chromosome cosmid library (with >13,000 cosmids) was digested with a trapping restriction enzyme and ligated to a trapping cassette derived from the vector, pTAG4. The ligated DNA was then transfected transiently to COS7 cells and the resulting transcripts were amplified with 3 RACE and primers specific for the 3 terminal exon trapping vector. Second, a novel technique has been developed to transfer en masse the Y chromosome DNA from the initial cosmid library (in Lawrist 16) to another cosmid trapping vector, pTAG5. This procedure inserts the Y chromosome DNA immediately downstream of the 3 terminal exon trapping cassette, thereby the transcription of composite transcripts suitable for the exon trapping procedure. Analysis of the resulting exons by PCR using specific primers of 9 genes on this chromosome indicated that all were present in the final exon products which were subsequently subcloned into pAMP1 plasmid and arrayed into 51 96-well microtiter dishes. Hybridization analysis indicates that 8.7% and 2.8% of these exon clones were derived from two repeated genes, TSPY and RBM, on the short and long arms respectively of this chromosome. Random sequencing of 170 individual clones confirmed the frequencies of both genes. Further, over 54% of the clones were either homologous to sequences of genes located on this chromosome or unique sequences having characteristics of valid 3 terminal exons. Sequence analysis of 14 TSPY exon clones indicated that 57% of them harbor 2-4 internal exons in addition to the terminal exon. The results indicate that: 1) en masse 3 terminal exon trapping for an entire chromosome is highly feasible, 2) >50% of resulting exon clones are either derived from known Y genes or potential functional sequences from this chromosome, and 3) similar strategy should be applicable for en masse isolation of ESTs from other human chromosomes.

  4. Transcript mapping in the nonrecombinant interval for the mouse neuromuscular disease gene mnd2 and the corresponding region of human chromosome 2p13

    Miriam Meisler, Wonhee Jang and John S. Weber
    Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA

    The mnd2 mutation in the mouse causes a lethal muscle wasting disease. Affected mice develop an unsteady gait with uncontrolled muscle contraction and die by 40 days of age. The mnd2 gene is located within a conserved linkage group on mouse chromosome 6 that corresponds to human chromosome band 2p13. In order to isolate this disease gene, we generated a high resolution genetic cross that localized the gene to an interval of 0.2 +/- 0.1 cM, which in the mouse genome is equivalent to approximately 400 kb. Within and around the nonrecombinant interval, we have mapped nine positional candidate genes from human chromosome 2p13 and four novel genes isolated by exon trapping of P1 clones. The complete sequences of the coding regions of six genes within the nonrecombinant region, from homozygous mutant DNA and the strain of origin, has not detected any mutations. The results of Northern blotting indicate that all of these transcripts are present at normal levels in mutant tissues.

    The novel genes identified in the region include one new member of the WD-repeat family and one evolutionarily conserved gene with 35% sequence identity to a gene in C. elegans. To progress from individual exons to full length cDNAs we have relied heavily on 5' and 3' RACE and vector-insert PCR from cDNA libraries. For some genes, identification of related human ESTs has been an important tool. Our results indicate that this is a gene rich region with conserved gene order in human and mouse that would be a good substrate for comparative sequencing. A comparative gene map of the human and mouse chromosome regions will be presented.

  5. New genes within the MHC: Cloning by paralogy: The case of the B30-gene family

    Ribouchon M.T., Offer C., Matte M.G., Tazi Hanini R., Henry J., and Pontarotti P.
    INSERM U119, Marseille, France

    We describe an easy method to identify new genes within a given genomic region. This concept is based on the fact that multigenic family members are derived from a common ancestor and that some duplicate genes stay syntenic (linked) even after several hundred of million years. Our pilot experiment was performed on the RFB30 multigenic family. During the search of new coding sequences within chromosomal region which contains the human histocompatibility complex, we have found an exon encoding a protein domain (170 amino acids) named B30.2. This domain was found associated with 3 different N terminal domains in few other proteins 1) A Ring finger domain as in the case of RFP 2) An IgV-IgC like domains in the case of butyrophilin (BT) 3) A leucine Zipper domain. Several of these proteins were colocalised in the same Chromosomal region (Vernet et al., 1993, Amadou et al., 1995). The accumulation of new information in dbEST suggested us the possibility to use it for the screening to obtain the cDNA of other members of this large family. We have screened the dbESTdata base using the BLASTx algorithmic software with the amino-acid sequence of the RFP and BT protein. 31 ESTs showed a similarity with the RFP and BT proteins. Some clones showed overlapping sequence meaning that they come from the same transcription unit . The clones for which we did not found similarities or overlapping sequence with other selected ESTs were used to hybridize a Southern blot containing genomic DNA digested by different restriction enzymes. The same pattern of hybridization found with two different ESTs means that they come from the same genomic region and thus correspond probably to the same gene. After the sequencing of the different ESTs, transcripts consensus (TC) were constructed corresponding eventually to full length transcript. Eight TCs were selected by this methodology. Chromosomal localization was performed by in situ hybridization on chromosome. Three of the TCs were localized to the chromosome 6 in the 6p21.3-6p22 band, thus likely close or within the MHC. They were more precisely map using Yeast Artificial Chromosome (YAC) covering the MHC and its telomeric part. Therefore the cloning by paralogy could be an efficient approach to clone new genes within a given chromosome region.

    Vernet C.et al., J Mol Evol 37:600-612, 1993 Amadou C. et al., Genomics, 26:9-20, 1996

  6. The isolation of CpG islands from human chromosomal regions 11q13 and Xp22 on the basis of thermal stability of DNA fragments covering these regions

    Masahiko Shiraishi
    Oncogene Division, National Cancer Center Research Institute, Tokyo, Japan

    We have developed a method for the preferential isolation of DNA fragments associated with CpG islands on the basis of thermal stability of the fragments (segregation of partly melted molecules, SPM). SPM has been applied to isolate CpG islands from the human chromosomal regions 11q13 and Xp22. The former is a CpG-rich region (R band), with the latter being less CpG-rich (G band). Nine P1 clones covering a 500-kilobase (kb) region of 11q13 and ten cosmid clones covering a 300-kb region of Xp22 were digested with four restriction endonucleases, MseI, BfaI, NlaIII, and Tsp509I. These restriction enzyme digests were then subjected to denaturing gradient gel electrophoresis. About sixty fragments from 11q13 P1 clones and five fragments from Xp22 cosmid clones were obtained as retained ones. Nucleotide sequence analysis revealed that many of the recovered fragments from the 11q13-derived clones contained CpG islands, including that of the ADRBK1 gene, which has previously been assigned to this region. However, none of five from Xp22 clones possessed properties for CpG islands. These results suggest that SPM facilitates the efficient isolation of CpG islands reflecting CpG density of the corresponding region.

  7. A gene rich region around the ATM locus - Physical map and identification of two new transcripts in the 11q23.1 region N. Udar(1), X. Chen(1), S. Xu(1), J. O. Bay(1), S. Dandekar(1), T. Liang(1), N. Uhrhammer(1), H. Shizuya(2), H. Yang(1), P. Concannon(3), L. Yang(1), Z. Wang(1), M. Telatar(1), R. A. Gatti(1)
    . (1)Department of Pathology, UCLA School of Medicine, Los Angeles, CA, USA (2)Division of Biology, California Institute of Technology, Pasadena, CA, USA (3)Virginia Mason Research Center, Seattle, WA, USA

    The gene for the autosomal recessive neurological disease - Ataxia Telangiectasia (ATM) has recently been identified. An international consortium of eight laboratories localized the A-T gene by linkage analysis of 176 families to chromosome 11q23.1, between the markers D11S384, and D11S535. We constructed a series of contigs using three BACs and twelve cosmids, spanning this region of approximately 400 kb. We also developed twenty-one STS markers from the BACs and cosmids. With all this information we have been able to build a precise map of the contig with respect to the exons for ATM and STSs. This information will be useful for further studies of functional domains and regulatory functions within the ATM gene.

    Two new transcripts were isolated by cDNA selection techniques, CAND3 and NS6. Both these transcripts are immediately proximal to the ATM gene. CAND3 spans ~140 kb of genomic DNA, and is located immediately centromeric to ATM, with 544 bp separating the two genes. The two ubiquitously-expressed transcripts, ATM and CAND3, are transcribed in opposite directions, sharing a common promoter region. The predicted protein has weak homologies to transcriptional factors, nucleoporin protein, and protein kinases. No homology to ATM, nor any mutation of CAND3 in A-T patients, have been found. The 5 to 5 orientation of CAND3 and ATM suggests co-regulation of biologically-related functions. The other transcript, NS6, is a novel sequence with near perfect homology with the catalytic domain of the sodium glucose cotransporter gene family. This gene localizes proximal to CAND3 and the MAT (ACAT) gene. This makes a total of 4 genes within a 1 Mb region.

  8. Analysis of differentially expressed genes using solid-phase approaches in transcript selection and in characterization of cognate proteins expressed in bacterial systems

    Jacob Odeberg, Magnus Larsson, Mathias Uhlén, Stefan Sthl and Joakim Lundeberg.
    Department of Biochemistry, KTH, Royal Institute of Technology, S-100 44 Stockholm, Sweden, Phone: (int)-46-8 790 87 58; Fax: (int)-47-8 24 54 52.

    Solid-phase methods based on both differential display (DD) and representational differential analysis (RDA) have been designed enabling differential gene expression analysis in samples with scarce amounts of mRNA originating from stem cells, sperm cells and colon tissue. The principle has been to utilize the streptavidin biotin system to capture nucleic acids onto magnetic microbeads and combine that with in vitro amplification techniques. The immobilization of mRNA and/or DNA fragments to a solid phase has significantly simplified the purification procedure and minimized sample loss that may also facilitate future automation. To investigate the biological function of the isolated fragment a bacterial expression approach has been designed for the production of cognate gene fragments. Specific bacterial expression vectors has been constructed to create fusion proteins between isolated fragments and affinity protein tags of protein A and protein G origin. Fusion proteins, generated after E. coli propagation and affinity chromatography purification, have been employed in the subsequent eliciting of specific antibodies enabling various functional analysis in the monitoring of expressed genes.

  9. Continuing studies on the subcellular localization of mRNAs in neurons

    Jennifer Phillips, Kevin Miyashiro, Peter Crino and Jim Eberwine
    University of Pennsylvania Medical School, Philadelphia, PA, USA

    In both the developing and mature neuron, plasticity of the cell is in part determined by the activities of neuronal processes. Neuronal processes (axons and dendrites) establish the polarity of the neuron and subserve specialized functions, such that dendrites provide information to presynaptic cells and axons provide information to post-synaptic cells. The factors which account for the specialized functions of neuronal processes has been an area of intense research and has lead to the characterization of certain proteins which are selectively enriched in these structures. Additionally, in situ hybridization has localized several mRNAs to dendrites and a few mRNAs in axons of specific neuronal subtypes. Recently, we have used more sensitive techniques (single cell nucleic acid amplification, expression profiling, and sequencing of cDNAs derived from differential display) to analyze the mRNAs present in processes and have greatly expanded the known repertoire of mRNAs. We have also been developing a more sensitive technique to determine proteins in individual processes (immuno-aRNA). This approach will provide information on the mRNA/protein profile of individual processes and their developmental regulation and potentially insights into the mechanisms of plasticity.

  10. Successful modeling of the functional organization of transcriptional units

    Thomas Werner
    Forschungszentrum fuer Umwelt und Gesundheit GMBH Institut fuer Saeugetiergenetik,, Oberschleissheim, Germany

    Regulatory regions of higher eukaryotes usually encompass multiple regulatory elements which exert their regulatory function only within the correct context. Similarity of regulatory regions often is not evident on sequence level precluding global alignment strategies for their detection. We used a strictly modular concept to deduce the functional organization of regulatory regions from a set of such regions. This concept - implemented in the program Model Generator - allows development of highly specific models from a training set of 10 to 20 sequences. Only a simple initial model (e.g. two characteristic transcription factor binding sites) is required. Construction of the model can reveal new common elements. Therefore, model development also allows delineation of new potential functional elements for targeted experimental verification in addition to the generated model. Another program designed ModelInspector has been developed to scan new sequences for matches to these models.

    We present two models for retroviral transcriptional control regions which contain functional pol II promoters. Both models (C-type LTR and Lentivirus LTR, e.g. HIV) were used with the program ModelInspector in order to assess the specificity of the models. All LTRs from both training sets and 18 LTRs of other types (B- type, D-type, Spuma, HTLV I/II) were matched against the models. Both models fully recognized their respective training set and clearly discriminated against all other LTR types. The program ModelInspector is also able to carry out database searches with these models. Tests with the C-type LTR model have shown that ModelInspector did not miss a single known LTR (= no false negatives). The program revealed 5 new C-type LTR candidates in addition to 17 known full length LTRs in the primate section of GenBank Release 92.0 (about 95 million nucleotides on both strands). The first of these candidates has been experimentally verified as a weak (pol II) promoter. This agrees with the low scoring of this LTR which also suggested weak activity and demonstrates the predictive power of organization-based models of transcriptional units in mammalian genomes.