Beyond the Identification of Transcribed Sequences: Functional and Expression Analysis

9th Annual Workshop, October 28-31, 1999

Co-sponsored by the U.S. Department of Energy


Experimental approach towards identification of small non-messenger RNAs in the genome of Caenorhabditis elegans

Anja op de Bekke1, Alexander Huttenhofer1, Martin Kiefmann1, John O'Brien2,  Hans Lehrach2 and Jurgen Brosius1

1Institute of Experimental Pathology/Molecular Neurobiology, ZMBE, University of Munster, Munster, Germany
2Resource Centre, Max-Planck-Institute for Molecular Genetics, Berlin, Germany

Genome projects allow identification of complete sets of genes in a given organism. This is a prerequisite for completely understanding its biology   including gene expression, function of its products and evolutionary   relationships. Current approaches primarily focus on protein coding genes. Those expressing transcripts that contain short (< 300 nt) open reading   frames (ORFs) or non-messenger RNAs are currently difficult to identify with  biocomputational methods only.  Non-messenger RNAs play a wide range of roles in the cell. It is expected  that eukaryotic cells contain a significant number of unknown non-messenger  RNAs with interesting functions. In the recently completed sequence of the C. elegans genome only genes   encoding ribosomal RNAs, tRNAs, five small nuclear RNA, two snoRNAs, 7SL  RNA, SL1 and SL2 splice leader RNA, Y RNA and lin-4 RNA were annotated. Many  RNA species that are expected to be present in the C. elegans genome were  not detected. Therefore, we took an experimental approach akin to the EST  projects in order to identify the majority of expressed RNA sequences (ERNs)  transcribed from the genome of C. elegans. 


Stable expression of epitope-tagged proteins in mammalian cells

Kenneth E. Heuermann and Bill L. Brizzard

SIGMA Chemical Company, St. Louis, Missouri, USA

Analysis of gene function often requires stable expression of the recombinant gene in a mammalian cell line.  This can be facilitated by incorporating an epitope tag, such as the FLAG® peptide (AspTyrLysAspAspAspAspLys), commonly used for the isolation, purification, and detection of recombinant proteins expressed in E.coli.  pFLAG-CMV-3 and pFLAG-CMV-4 vectors stably express secreted or intracellular N-terminal FLAG-fusion proteins, respectively, in mammalian cell lines. Initially, COS7 cells transiently transfected with pFLAG-CMV-3-BAP or pFLAG-CMV-4-BAP were shown to express bacterial alkaline phosphatase (E. coli phoA) by immunostaining using the M2 anti-FLAG monoclonal antibody.  Western analyses of cell extracts and media confirmed these results.  COS7 and CHO-K1 cells were then transfected with pFLAG-CMV-3, pFLAG-CMV-3-BAP, pFLAG-CMV-4, or pFLAG-CMV-4-BAP, to obtain stably transformed cell lines.  Transient expression of BAP by cells transfected with pFLAG-CMV-3-BAP or pFLAG-CMV4-BAP, but not pFLAG-CMV-3 or pFLAG-CMV-4, was demonstrated by immunostaining in parallel test plates.  Transfected cells were then selected by treatment with 500 ug/ml G418 sulfate.  At 20 days post-transfection after three changes of medium, surviving COS7 and CHO cells transfected with pFLAG-CMV-3-BAP or pFLAG-CMV-4-BAP continued to express BAP, as shown by detection of FLAG-tagged BAP by Western analysis. This result indicates stable integration of the neomycin resistance gene and FLAG-tagged BAP construct.


Structure and function of the spermatogenesis genes located in AZFa, a region of the human Y chromosome deleted in men with complete germ cell aplasia

Kamp, Christine1, Kirsch, S1, Hirschmann, P1, Ditton, HJ1, Brede, G2, Tyler-Smith, C2, Rappold, GA1, and Vogt, PH1

1Institute of Human Genetics, Heidelberg, Germany
2Department of Biochemistry, Oxford, United Kingdom

In mammalian species X and Y, the so-called sex chromosomes, evolved from an extant pair of ordinary autosomes. One of these autosomes was elected to become the Y chromosome most likely because of a male specific selection of SA (Sexually Antagonistic) alleles, i.e., favoured in one sex, but disfavoured in the other. This resulted in a continuous reduction of crossing-over events and an accumulation of Y-specific DNA loci functional for male sex determination and male germ cell development. We focussed our research on a Y region in proximal Yq11 which was mapped by STS content analyses to be essential for male germ cell proliferation (AZFa region; 1). Men with deletions of AZFa suffer from a complete aplasia of germ cells in their testis tubules.

The molecular extension of the AZFa region is not known. We therefore established first a physical restriction map along the AZFa region with the aid of a complete YAC contig and estimated a molecular AZFa extension of at least one 1 megabase (Keil, R et al. in prep.). To analyse the gene content of AZFa we performed systematically organised exon trapping experiments with a series of PAC clones mapped in a contig by Alu-vector PCR, cross hybridizations of Y-specific end fragments and overlapping STS contents. 11 PAC clones were sufficient to cover the complete AZFa region. Multiple exons were isolated in each exon trapping experiment. Sequence analyses and homology searches in the EST and genomic databases identified some of them as exons of the DFFRY gene and DBY gene isolated recently as complete cDNA clones by Lahn and Page (2). Some of them hit other ESTs expressed in multiple tissues, some of them hit no data bank entry. This suggests that the AZFa region contains multiple Y genes expressed not only in testis tissue. This view got support by subsequent analysis of each novel exon clone on RNA-dot-blots and their identification in GeneFinder cDNA pools (Resource Center of German Human Genome project).


1. Vogt, PH et al. (1996) Human Y chromosome azoospermia factors mapped to different subregions in Yq11, Hum. Mol. Gen. 5: 933-943.
2. Lahn, BT and Page, DC  (1997) Functional Coherence of the Human Y Chromosome, Science 278: 675-680.


Analysis of gene expression data generated by oligonucleotide fingerprinting

Christof Bull, John O'Brien, Uwe Radelof, Ralf Herwig, Steffen Hennig, Axel Nagel and Hans Lehrach

Abteilung Lehrach, Max-Planck-Institut fuer Molekulare Genetik, Berlin, Germany

Oligonucleotide fingerprinting (OFP) is a powerful method for genome-wide expression analysis and gene finding. It is based on the analysis of arrayed cDNA libraries by sequential hybridisation of 200 oligonucleotides 10 bp in length. Clones are grouped into clusters according to their hybridisation fingerprints. The number and the size of the clusters provide information about the spectrum of expressed genes and their relative expression levels respectively, whereas the fingerprint itself is used for database matching of the cDNA clones. We can therefore identify the corresponding gene of a cDNA clone and get information about expression rates at the same time. We have performed OFP on cDNA libraries from human monocytes and dendritc cells with 100,000 clones each. The clones were grouped into 11,897 clusters plus 25,582 singletons (clusters with just one member). This would correspond to a variety of 37,479 different genes that are found to be expressed in either of the cell types. However, due to technical reasons we observed a 1,57 fold overestimation of expressed genes in previous experiments so that we would estimate the real number to be around 24,000 expressed genes. Of the genes that are differentially expressed between monocytes and dendritic cells, we selected 260 genes that are of particular interest to us for further studies. We will re-array these and other selected clones from the libraries to a non-redundant set. This clone set will be further evaluated in expression studies using cDNA arrays and complex probes derived from hematopoetic cell types including monocytes and dendritic cells from various differentiation stages. There were also approximately 1,000 potentially new genes which are currently being tag-sequenced at the MPI-MG. The massive fingerprinting and sequencing data that we have obtained are analysed by highly automated computer tools. The sequence data are compared to the following databases: dbEST, GenEMBL, human UniGene, SWISSPROT and our cDNA sequence databases from sea urchin, amphioxus and zebrafish. Following the database searches (BLAST) a series of further analysis steps is performed, including filtering of blast output files, clustering of related matches and tabulating the results in web-pages, which allow easy access to the analysis details. We will integrate all our data and think that especially the comparision of gene expression patterns from homologue genes in model organisms will be very useful to determine the function of new human genes.


Analysis of novel genes from human chromosome 21: determination and characterization of complete protein sequences and examples of overlapping genes

Dobromir Slavov, Roger Lucas, Andrew Fortna and Katheleen Gardiner

Eleanor Roosevelt Institute
Denver, Colorado, USA

We are currently using both computer based and experimental approaches to identify and characterize novel genes within the genomic sequence of human chromosome 21. Exon prediction, EST database matches and CpG island identification together are highly efficient at demonstrating the presence of a gene within a segment of DNA. Determining the complete structure of a novel gene and verifying its expression, however, is often more challenging, in particular for genes lacking any significant protein similarities. Problems are compounded when low or restricted expression precludes obtaining information from Northerns or cDNA libraries.

Our preliminary gene identification is based on consistent exon prediction by at least Genscan and Grail programs and/or EST matches that show evidence of exon splicing. CpG island identification, information from RT-PCR and RACE experiments are then added to these data. By these means, we have deduced complete protein sequences for seven novel genes. Protein sizes range from ~250 amino acids to >1500 amino acids. Five proteins contain no discernible protein homologies or motifs; two show only distant homologies that provide no functional clues. None is positive by Northern analysis. Protein sequences have been examined for biochemical and structural features such as amino acid content, hydrophobicity, polarity, and presence of beta sheets and alpha helices. Rarely have such data shown unusual features.

In two other cases, we have evidence of potentially overlapping genes. In both cases, the gene on one strand is represented by consistent exon prediction but no ESTs, and the gene on the opposite strand is represented by one or more ESTs but by no convincing exon predictions. Consensus splice sites are found only on the appropriate strand in all cases. In one case, the exon prediction gene is located within an intron of the EST gene; in the other case, EST exons interdigitate with consistent exon prediction. Expression levels in all cases are low and/or restricted. Analysis of such gene models would be facilitated by corresponding mouse genomic sequence, by adding more coding sequence data to EST sequences, and by more comprehensive information on exon prediction false positive rates.


ASDB: Novel database of alternatively spliced genes

I. Dubchak1,  M. S. Gelfand2, I. Dralyuk1, M. Zorn1, S. Spengler1

1National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
2Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia

Alternative splicing is an important regulatory mechanism in higher eukaryotes. By recent estimates, at least 30% of human genes are spliced alternatively (1). Alternative splicing plays a major role in sex determination in Drosophila, antibody response in humans and other tissue or developmental stage specific processes. The database of alternatively spliced genes can be of potential use for molecular biologists studying splicing, developmental biologists, geneticists, and cell biologists.  We have created a public Alternative Splicing Database (ASDB) (2) for the biological community as a repository of data on alternatively spliced genes.  ASDB is currently available at the URL  The administrator of the database can be contacted by Email:

Our original set of 1663 proteins was generated by selecting all SwissProt entries containing the words "alternative splicing". Clusters of proteins that could arise by alternative splicing of the same gene were created by a string comparison procedure. Two proteins from the same species were considered belonging to a cluster if they have common fragments not shorter than 20 amino acids. Each cluster is represented in the database by the multiple global alignment of its members, allowing for easy identification of regions produced by alternative splicing. The database contains 241 clusters with more than one member.

The database can be searched using Medline, SwissProt, and GenBank identifiers and accession numbers. Standard context search can be performed over SwissProt keyword, description, taxonomy, and comment fields and feature tables. ASDB contains internal links between entries and/or clusters, as well as external links to Medline, GenBank and SwissProt entries.

The next step in ASDB development will involve building a nucleotide division of ASDB by incorporating DNA data from GenBank and other sources, classification of the main types of alternative splicing, and adding data on aberrant splicing and splicing mutations.


1. Mironov, A.A. and Gelfand, M.S. Proc. 1st Int. Conf. on Bioinformatics of Genome Regulation, 1998. v. 2, p. 249.
2. Gelfand, M., Dubchak, I., Dralyuk, I., M.Zorn.  Nucl.Acids.Res. 1999, 27, 301-302.


Functional and molecular evolutional analysis of predicted gene products of human long cDNAs

Reiko Kikuno, Takahiro Nagase, Ken-Ichi Ishikawa, Mikita Suyama, Mina Waki, Makoto Hirosawa, Nobuo Nomura, and Osamu Ohara

Kazusa DNA Research Institute
Kisarazu, Chiba, Japan

Our cDNA sequencing project has focused on human cDNAs that have a potential to code for large proteins expressed in brain and so far published more than 1200 new cDNA sequences. We have assigned a KIAA number to each sequence as a name of the gene. Among them, the gene products of 1117 cDNAs were predicted and studied in detail from functional and molecular evolutional viewpoints by computer analysis at amino acid sequence level. Database search of the protein sequences revealed that functions of 682 proteins (61.1%) were classified into 7 classes such as Structure/Motility (8.7%), Metabolism (3.2%), Cell division (1.1%), Signal/Communication (25.3 %), Nucleic acids managing (16.7%), Protein managing (4.1%), and others (1.9%), while the functions of 139 protein could not be predicted although the sequences indicated significant similarity to other sequences and/or motifs in public databases. The number of the protein sequences that showed no significant similarity neither to known sequences nor to motifs was 296 (26.5%).

The detail results of our analyses for each KIAA gene can be browsed in our HUGE protein database, which is accessible via World Wide Web at In addition, the HUGE database contains experimental results such as expression profiling and RH-mapping together with their experimental conditions. Functions of keyword search and homology search were also prepared to retrieve KIAA entries of user's interest.

To examine the evolutionary origin of KIAA genes, we compared the protein sequences with those deduced from the genomes of yeast (S. cerevisiae) and nematode (C. elegans). It was shown that 1) only 3% and 10% of KIAA genes had homologous genes (i.e., homologous along the entire coding regions) in yeast and nematode, respectively; 2) 35% and 14 % of KIAA gene products were lack of any homologous regions to gene products encoded by the genomes of yeast and nematode, respectively. The remaining KIAA gene products were found to have some homologous region(s) in these lower eukaryotes. Possible evolutionary processes of these genes will be also discussed.


Large scale cloning, sequencing and expression profiling of genes expressed in transcription factor CREM dependant manner during mouse spermatogenesis

Igor Borissevitch, Tim Beissbardth, Andreas Hoerlein and Guenter Schuetz

German Cancer Research Center, Heidelberg, Germany

CREM belongs to CREB (cAMP Responsive Element Binding protein) family of transcription factors. CREMtau, an activator splice isoform of the CREM protein  is highly expressed after meiosis in round spermatides at stages from 1 to 5. According to the high level and time restriction of expression, CREM seems to be the major trigger of the expression at the late stages of spermatogenesis. 

This work represents large scale cloning, sequencing and expression profiling of RNA messages expressed in a CREM dependant manner. We have used gene targeting to selectively eliminate the transcription factor CREM (Nature, 1996, vol.380, pp.162-165). CREM -/- mice display a normal phenotype but males are sterile due to an arrest of spermatogenesis.  Spermatide development is blocked at stages 2-5 and results in the absence of cells of all further stages including spermatozoa. By use of subtractive suppression hybridisation we have cloned messages expressed in wild type but not in a CREM -/- mutant mouse. 12000 clones were analysed by sequencing and hybridisation. Redundancy of this library has been reduced by high density filter hybridisation with the most abundant clones. 950 clusters of sequences were obtained. They represent 79 known mouse genes, 81 homologs to known genes (mostly rat and human)  170 different mouse ESTs, 91 ESTs from other spices, 21 homologs to genomic sequences, 139 novel sequences.

These data compile new extensive information about gene expression during spermatogenesis. In addition, these data provide selection of genes to search for direct CREM target genes (for instance, known CREM target gene ACE presents in our library). Based on our results application may be found for diagnostic and therapeutic intervention in infertile patients with spermatogenetic abnormalities.


Genomic comparative analysis of the Fugu rubripes homologue of ETV6, a gene frequently rearranged in human leukemias

Alexandre Montpetit and Daniel Sinnett

Division of Hemato-Oncology, Charles-Bruneau Cancer Center, Sainte-Justine Hospital, Montreal, Canada and
Department of Biochemistry, University of Montreal, Montreal, Canada

Acute lymphoblastic leukemia (ALL) is the most frequent pediatric cancer. Little is known on the molecular pathogenesis of this disease. Recently, loss of heterozygosity (LOH) studies have reported deletions of the chromosome 12p12.3 in 15-47% of pre-B ALL patients. This region was also found deleted in several other hematological malignancies as well as a variety of solid tumors suggesting the presence of a putative tumor suppressor gene. The chromosomal region containing this tumor suppressor locus was restricted to a ~750kb interval that includes the gene ETV6, encoding an ets-like transcription factor required for hematopoiesis in bone marrow. Accumulating evidences suggest that ETV6 is not the tumor suppressor gene targeted by the deletions. However this gene is frequently found translocated with different partners leading to the creation of chimeric products in hematological disorders suggesting that this region might be intrinsically unstable. The search for genes in large mammalian genomic region is the limiting step in positional cloning. We propose that this task could be facilitated by genomic comparative studies of the region of interest in the compact genome of Fugu rubripes that is 7.5 times smaller than that of human but contains a similar repertoire of genes. Here we report the characterization of the Fugu homologue of ETV6 (frETV6) that constitutes the initial step toward the comparative mapping of the human chromosome 12p12.3 tumor suppressor locus. Two Fugu genomic libraries were screened with a human ETV6 cDNA leading to the identification of 6 genomic clones that are part of a contig covering more than 175 kb. The 8 exons of frETV6 were identified and located within a 15kb region. Compared to the human homologue, which is spread over 350 kb, this represents a substantial 23-fold compaction. At the protein level, we observed an overall 57% sequence identity between both species. In particular, the conserved ETS and PNT domains showed 95% and 69% amino acid identity, respectively. A phylogenetic analysis conducted on ETV6 sequences from human, mouse, chicken, Fugu and zebrafish revealed a close relationship between the two teleost fishes. The comparative mapping of the suppressor locus will be extended to the region flanking frETV6 in order to identify candidate genes that could be involved in ALL and/or other cancers as well as to get clues about the cause of the chromosomal instability affecting this region.


Comparison of completed genomes to sample sequences of related genomes

Sandra W. Clifton(1), Michael McClelland(2), Webb Miller(3), William R. Pearson(4), Aaron J. Mackey(4), and Richard K. Wilson(1)

1Genome Sequencing Center, Wash Univ School of Medicine, St. Louis, MO, USA
2SKCC, San Diego, California, USA
3Dept Computer Science & Engineering, Penn State Univ, University Park, PA, USA
4Dept of Biochemistry, University of Virginia, Charlottesville, Virginia, USA

Complete genomes provide a useful framework for organizing and analyzing partial sequences from related genomes. A sample consisting of 2X or 3X genome equivalents gives coverage of over 90% of the genome in which more than 99% of all ORFs over 500 bases in length should be represented by a fragment of at least 100 bases. Information on the presence of shared ORFs and partial identities of unique ORFs can be obtained at a fraction of the cost of complete sequencing.

To determine the utility of sample sequences, we have collected data from two Enterobacteria, Salmonella paratyphi A (SPA), and a clinical isolate of Klebsiella pneumoniae (KPN). These strains are of interest as human pathogens and for understanding enterobacterial evolution. SPA is very closely related to the completed Salmonella genomes, whereas, KPN is a sister clade of Salmonella and Escherichia. Over 10 million bases of raw sequence, representing between 2X and 3X genome equivalents, were collected from both SPA and KPN, which melded to 4,384 kb and 5,084 kb, respectively.

For Enterobacteriaceae, the E. coli K12 genome (ECO) is completely sequenced [U.Wisconsin] and the genomes of Yersinia pestis (YPE), Salmonella typhi (STY) [Sanger Center] and S. typhimurium LT2 (STM)[Wash. U.,] are soon to be completed. The ECO sequence has been aligned to the available sequence from each of STM, STY, SPA, KPN, YPE, and Vibrio cholera. These alignments can be viewed as a "percent identity plot" or PIP, in which percent identities of ungapped matches are shown in the Y-axis for each pairwise comparison. Deletions in the sampled genomes and the sites of rearrangements and of significant insertions are visualized in color. The alignments can be queried with any named ECO gene and the corresponding region is visualized in multiple genomes, simultaneously. Matching sequences in each aligned genome, associated with the reference gene and flanking regions, are automatically made available.

Unique portions of the complete and sampled genomes were identified with the FASTX and TFASTX programs. To search for unique regions and potential rearrangements in the sampled genomes, each sampled sequence is compared to the E. coli proteome using FASTX and the complete E. coli proteome is compared against partially sequenced genome databases using the TFASTX program. We present lists of (a) all ORFs found in ECO for which orthologues are apparently absent in the STM, SPA, or KPN samples,(b) sequences over 400 bp in length that are found in one or more of STM, SPA or KPN, but are absent in ECO. The best homologues of these "unique" regions are determined from other sequence databases, including incomplete genomes deposited at NCBI.


Genomic characterisation of early disseminated tumor cells isolated from bone marrow of breast cancer patients

Oleg Schmidt-Kittler1, Julian Schardt1, Günter Schlimok2, Gert Riethmüller1 and Christoph Klein1

1Institut für Immunologie der LMU, München, Germany
2Zentralklinikum Augsburg

Because genomic changes constantly accumulate during tumor progression the link between structural changes within the genome and the malignant behaviour of a cell is hard to establish at a later stage of the disease. To identify genes involved in processes of early systemic disease, such as dissemination and ectopic survival, we analyzed single disseminated tumor cells from the bone marrow of breast cancer patients. The genomic aberrations of these cells should be the result of selection pressures.

Disseminating tumor cells can be detected at a frequency of about one tumor cell per one million bone marrow cells and isolated by micromanipulation. We then amplified the genome of the single tumor cells using a recently developed PCR technique. Subsequent comparative genomic hybridization (CGH) revealed gains and losses of specific genomic regions. With more genomic profiles of single disseminated tumor cells now becoming available and with the use of high resolution techniques such as matrix CGH one should be able to identify genotypes and genes that may be characteristic for dissemination and ectopic survival.


Oxidative metabolism and gene expression: Gene discovery array analysis

Pieter Rottiers, Vera Goossens and Johan Grooten

Laboratory of Molecular Biology, Flanders Interuniversity Institute for Biotechnology (VIB) and University of Ghent, Ghent, Belgium

Treatment of the mouse fibrosarcoma cell line L929 with tumor necrosis factor (TNF) induces necrotic cell death by a mechanism that depends on production of reactive oxygen intermediates (ROI) by mitochondria (Goossens et al., 1995). Besides oxygen, bioenergetic pathways characteristic of tumor cells (usage of glutamine instead of glucose as oxidative substrate) markedly enchanced ROI production and thus influenced the sensitivity of the cell to TNF-induced necrosis (Goossens et al., 1996). Besides resistance to TNF cytotoxicity L929 cells that have been adapted to use glucose (normal metabolism) instead of glutamine (tumor-specific metabolism) as oxidative substrate exhibit, a differentiated morphology and decreased rate of cell division (Goossens et al., 1996). Apparently, the oxidative metabolism of the cell affect also features characteristic of transformed cells namely uncontrolled growth and dedifferentiation. To verify whether altered gene expression underlies this differential behaviour, we performed PCR-based suppression subtraction hybridization to identify genes, which are differentially expressed between L929 cells dependent on glutamine (TNF-sensitive; dedifferentiated morphology) and L929 cells dependent on glucose (TNF-resistant; differentiated morphology). The subtracted PCR products were hybridized on GDA filters (Genome Systems), spotted with 18,000 non-redundant mouse cDNA clones. Subsequent analysis identified genes known to be involved in the oxidative metabolism of the cell or to contribute to signal transduction pathways, and resistance/sensitivity to apoptosis. In addition a large number of unknown genes were revealed. This result establishes a firm link between oxidative metabolism and gene expression.

Goossens, V., Grooten, J., De Vos, K. & Fiers, W. (1995) Direct evidence for Tumor Necrosis Factor-induced mitochondrial reactive oxygen intermediates and their involvement in cytotoxicity. Proc. Natl. Acad. Sci. U.S.A. 92: 8115-8119.

Goossens, V., Grooten, J. & Fiers, W. (1996) The oxidative metabolism of glutamine - A modulator of reactive oxygen intermediate-mediated cytotoxicity of tumor necrosis factor in L929 fibrosarcoma cells. J. Biol. Chem. 271: 192-196.


Divergent 2-adrenoceptor subtypes in the zebrafish (Danio rerio)

Jori Ruuskanen1, Minna Varis2, Erik Salaneck4, Tiina Salminen2, Tommi Nyronen3, Mark S. Johnson2, Dan Larhammar4 and Mika Scheinin1

1Department of Pharmacology and Clinical Pharmacology, Univ of Turku, Finland
2Department of Biochemistry and Pharmacy, Akademi University, Turku, Finland
3Center for Scientific Computing (CSC), Espoo, Finland
4Department of Neuroscience, Unit of Pharmacology, Uppsala University, Sweden

2-Adrenergic receptors (2-AR:s) belong to a large family of G-protein coupled receptors. A common feature of these receptors is seven a-helical transmembrane domains (TM) thought to form the binding pocket for ligands. They mediate many of the physiological effects of adrenaline and noradrenaline and are target molecules for several drugs. Three human 2-AR subtype genes (2A, 2B and 2C) have been cloned to date. The number of 2-ARs in fish has remained unclear. Only one 2-AR in the fish Cuckoo wrasse (Labrus ossifagus) has been cloned. This receptor, named 2F, has been thought to represent an ancestral 2-AR subtype. Its ligand binding properties are intermediate between 2A and 2C. However, it shows greatest sequence similarity to the 2C and from an evolutionary point of view it is more likely that fish also have three 2-AR subtypes. To study the structure and evolution of 2-AR:s and their possible importance in developmental biology, we have turned to another species of fish, the zebrafish (Danio rerio), which is highly amenable for developmental and genetic studies.

We have cloned the genes coding for the zebrafish 2A-AR and 2C-AR. At protein level, both of these show around 60 % sequence identity when compared to their mammalian counterparts and 50 % identity when compared to other 2-AR subtypes. Analysis of the TM regions is based on a frog rhodopsin based model of the human 2A-AR. In the predicted TM regions, identity of the zebrafish 2A-AR with the human orthologue is 83.3 %, with the rat 2A 82.3 and with the mouse 82.3%. For the zebrafish 2C identities are: human 2C 80.3 %, rat and mouse 2C 86.9 %. These values are much lower than the percentage identities between human and rat or mouse orthologues; 97.5 % for the 2A and 98.5 for the 2C. The rat and mouse orthologues are 99-100 % identical. The greater diversity of the zebrafish receptors is expected to reveal certain residues important for a typical 2-adrenergic ligand binding, which in turn could help in designing subtype selective 2-drugs. Screening for additional subtype(s) using a probe corresponding to an 2B-AR like unpublished EST (GenBank acc. no. AI461341) has been carried out and further analysis of the resulting clones is in progress. Chromosomal mapping of the cloned receptor genes has been started in collaboration with Prof. John H. Postlethwait's group at the University of Oregon, Eugene, USA. Expression and pharmacological testing of the cloned receptor genes has also been started.


GATEWAY cloning: A high-throughput gene transfer technology for rapid functional analysis and protein expression

James Hartley, Gary Temple, Michael Brasch, et al.

Life Technologies, Inc., Rockville, Maryland, USA

Each step of characterization of new ORFs requires subcloning into specialized vectors that impart functional properties to the cloned segment. We describe a new method, called Recombinational Cloning (RC), that uses in vitro site-specific recombination to speed the accurate transfer of DNA segments between vector backbones. DNA segments flanked by recombination sites in an Entry Clone can be "automatically" transferred into new vector backgrounds simply by adding the desired "Destination" vector and recombinase, incubating for 1 hour, and transforming any standard E. coli strain. Strong selections ensure that the desired subclones are recovered at high efficiency (typically >90%), reducing or eliminating downstream analysis of candidate clones. The recombination is conservative (no net addition or loss of nucleotides) and transfer occurs without affecting the cloned DNA segment (in contrast to PCR-based approaches). Thus once Entry Clones are created, these clones can be validated and then transferred, unchanged, into any number of vectors. This permits the generation of large collections of validated ORFs (e.g., from model organisms) that can serve as a common source of clones for research. Collaborations to build such collections are in progress.

By incorporating 25 bp attB recombination sites into the 5' end of PCR primers, RC also permits efficient, directional cloning of PCR products (as Entry Clones). The resulting Entry Clones then can be rapidly transferred into any number of Destination Vectors for further analysis.

The RC method is fast, convenient, and can be automated, allowing numerous DNA segments to be cloned and then transferred in parallel into many different vector backgrounds. The resulting subclones maintain reading frame, allowing amino- and carboxy-fusions. Essentially any vector can be readily converted to a Destination vector. Approaches for optimization of protein expression, rapid functional analysis, and the integration of technology platforms will be discussed.


Sequence homology between human and mouse genomic regions to identify the tumor suppressor gene involved in B cell chronic lymphocytic leukemia

Bagrat Kapanadze, Nataliy Makeeva, Olle Sangfelt, Martin Corcoran, Anna Baranova, Eugene Zabarovsky, Nick Yankovsky, Dan Grander and Stefan Einhorn

CCK, Research Laboratory of Radiumhemmet, Karolinska Hospital, Stockholm, Sweden

Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region (MDR) of less than 10 kb, encompassing parts of two adjacent genes, termed Leu1 and Leu2 (leukemia-associated gene 1 and 2). Mutational analysis of Leu1 and Leu2 in 170 CLL samples revealed no small intragenic deletions or point mutations.  Subsequent examination of the genomic sequence around the MDR revealed several additional expressed transcripts (ESTs). In addition 50kb centromeric to this region another gene, Leu5 has been identified. This gene encodes a zing-finger protein  and shares homology to known genes involved in tumorogenesis. In order to further understand the genomic organisation of such a complex gene rich region, we decided to directly compare the human sequence with that of the mouse, as this may indicate the most important genes in the region, since critical genes tend to be highly conserved between mouse and human. A mouse genomic PAC library was screened with a number of probes covering a 100 kb distance in the human 13q14.3 region, including the MDR. Southern hybridization of subcloned fragments covering the MDR revealed several highly conserved areas, including exon1 and exon 2 of Leu2. Interestingly, human Leu1 does not seem to be conserved in mouse, by sequence analysis, whereas  Leu2 sequence was found to be highly conserved. In addition, the Leu5 protein encoding exon has > 95% homology with mouse sequence. In conclusion, following this analysis, the strongest candidates for a conserved tumor suppressor gene in this region are Leu5 and Leu2. Further work is required to elucidate their  mechanism of action in this disease and to further identify which gene or genes is the real tumor suppressor gene in BCLL.


Comparative mapping in the Japanese pufferfish (Fugu rubripes)

Clark, M. S; Edwards, Y.J.K; Shaw, L; Snell, P.; Smith, S; and Elgar, G.

Fugu Genomics HGMP Resource Centre, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom

The project within this laboratory to randomly sequence scan the Fugu genome finished a year ago. Since then we have concentrated on analysing three regions in depth, equivalent to the human chromosomal regions 11p, 20q and MHC. These regions are currently being mapped and, to a limited extent, sequenced. Data currently available indicates the utility of the Fugu genome for identifying conserved regulatory elements, novel genes, confirming predicted genes in human and evolutionary conservation of gene blocks. Examples of these will be illustrated from current unpublished research. There is also an ongoing project to develop ESTs in Fugu. A number of cDNA libraries have been constructed and so far 3,000 sequences have been randomly sequenced from these. Clustered data will be presented on these sequences.


Transcriptional regulation of the collagen a1(IX) gene during eye development

Elena I. Frolova and David C. Beebe

Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, Missouri, USA

We have shown that differentiation of the ciliary epithelium at stage 18 of chicken embryo development was accompanied by increased expression of mRNA for the long-isoform of collagen  a1(IX), LI-col(IX). Since the expression of this gene is very selective for the ciliary epithelium and occurs early in differentiation of this tissue study of its transcriptional regulation could provide information on transcriptional mechanisms responsible for targeting gene expression to the ciliary epithelium.

To determine the sites in the proximal promoter that were occupied by transcription factors in the differentiating ciliary epithelium we used an in vivo footprinting assay. The main conclusion from this experiment was that one or more DNA binding proteins in ciliary epithelium and retina occupies the proximal promoter of LI-col(IX). In addition, differences in the in vivo DMS footprinting patterns allowed us to conclude that there are different sets of proteins bound to this promoter in the ciliary epithelium and retina. Because LI-col(IX) is expressed in the ciliary epithelium, but not in the retina, these different complexes may be responsible for activating and repressing transcription in these two tissues. To map in more detail the fragments of the LI-col(IX) promoter that bind tissue-specific transcription factors, we used electrophoretic gel mobility retardation assays. Nuclear extracts from ciliary epithelium and neural retina of E7 embryos were analyzed for DNA binding with double-stranded fragme

Two fragments identified by mobility shift assay were used in a one-hybrid screen to identify prospective DNA-binding proteins. A cDNA library fused to the GAL4 activation domain (AD) in pACT2/Asc vector was synthesized from total RNA of the ciliary epithelium of day 7 chicken embryos. The size of the library was approximately 1.3x107 clones. At least 90% of clones contained inserts of 500 to 3,000 base pairs. Two potential DNA binding proteins were identified. One of the proteins (cZic2) showed high homology to mouse ZIC2 mRNA, a sequence recently shown to be expressed in the ciliary epithelium during mouse development.

To determine temporal and spatial pattern of cZic mRNA expression during development, we performed whole-mount in situ hybridization using chicken embryos. Overall expression in the chicken embryo was similar to expression of ZIC2 in mouse. In the eye, cZic is expressed throughout the inner layer of the optic cup at a low level at stage 14. At stage 18 expression is increased at the margin of optic cup, the differentiating ciliary epithelium. By stage 30 there is strong expression of cZic in the non-pigmented ciliary epithelium but not in the retina.


Identification of a novel cellular protein that binds to the HBV RNA pregenome

S. Kreft and M. Nassal

Department of Internal Medicine II, University Hospital, Freiburg, Germany

Due to its small genome of only 3,2 kb, the replication of the hepatitis B virus (HBV) is expected to be tightly linked to the exploitation of host cell functions. The viral RNA pregenome (pgRNA) serves several functions in the viral life-cycle. It functions as mRNA for the capsid and the polymerase (P protein), and it is the substrate for encapsidation and reverse transcription. On its 5' region the structured encapsidation signal is present, which upon binding of P protein, mediates specific RNA packaging into capsids and initiation of reverse transcription. Previous UV-crosslinking data provided direct evidence for the existence of cellular factors that bind close to .

A North-Western (NW) screening procedure with DIG-labeled RNA encompassing HBV as a probe was employed to identify cellular binding proteins. Thereby, a 2.1 kb cDNA from a human liver cDNA expression library was isolated, whose gene product, NIII, consistently bound to RNA in the presence of excess nonspecific nucleic acid competitors. It contained a 3' terminally incomplete ORF encoding a protein of 666 aa with no strong homology to any known protein or RNA-binding motif in the database. Using deletion variants of a bacterially expressed MBP fusion protein in NW-experiments, we mapped the RNA-binding domain to a lysine-rich region close to the C terminus of NIII. In a search for a full-length cDNA, several additional libraries were screened for homology to the central part of NIII (service provided by RZPD, Heidelberg, Germany). Four independent clones with differing 3'-ends were isolated. Two of them encode proteins with 95 and 542 additional amino acids at their C terminus compared to NIII, and hence contain the putative RNA-binding domain as an integral part of the peptide chain. The longest form was termed RBP138 for RNA-binding protein of 138kD. The two other clones both encode the same "truncated" form of RBP138, corresponding to the first 303aa of RBP138, consequently lacking the RNA-binding domain. Both carry a 14nt insertion that generates a STOP codon shortly thereafter. Although the genomic sequence of RBP138 is not yet characterized, this insertion is most likely due to an alternative splicing event. Taken all that together, different isoforms of RBP138 protein seem to exist. Currently, we study the RNA-binding specificity as well as the subcellular localization of some isoforms to unravel the biological role of these new proteins.


Return to Table of Contents