Groups Coordinate Gene Sequencing

cDNA Workshops Reduce Duplicative Work, Increase Production

When several research teams around the world announced plans in the fall of 1996 for full-length cDNA (gene) sequencing, investigators felt that the highly beneficial infrastructure provided since 1994 by the international Integrated Molecular Analysis of Genome Expression (I.M.A.G.E.) consortium [HGN 6(6), 3] should be extended to the challenges of complete cDNA sequencing. A subsequent workshop for I.M.A.G.E. participants was held in May 1997 in Gaithersburg, Maryland. The meeting was organized and chaired by Greg Lennon [then at Lawrence Livermore National Laboratory (LLNL) and now at Gene Logic Inc.] with Marvin Stodolsky coordinating for the meeting sponsor, the DOE Office of Biological and Environmental Research. Scientists attended from France, Germany, Italy, Japan, Sweden, the United Kingdom, and the United States.

Several workshop participants are members of the subgroup EURO-IMAGE, whose goals include generating and sequencing a master set of unique full-length cDNA clones (based on I.M.A.G.E. consortium resources) representing 3000 transcripts and 6 Mb of finished sequence. Other EURO-IMAGE goals are to obtain high-resolution and comparative functional mapping in human and model organisms of 1000 master-set genes and to develop the I.M.A.G.E. consortium database for easy access to an integrated view of the sequence, map, and expression data generated.

U.S. funding agencies represented at the workshop included DOE, NIH, and the recently established nonprofit Merck Genome Research Institute [HGN 8(3-4), 9]. Selected highlights follow of technical progress in complete cDNA sequencing, as reported at the workshop.

Highlights of Technical Progress
Attendees addressed a wide range of topics, including the status of cDNA sequencing projects, future targets, data- and clone-release policies, quality criteria and assessment, and mouse and other model organism cDNAs. Speakers projected that, with adequate support from funding agencies, participating laboratories could generate up to 15,000 full-length cDNA sequences in the following year. With average cDNA lengths of 2 kb, this represents some 30 Mb of total sequence.

Researchers have long recognized that expression of a single gene may culminate in the production of several different messenger RNA (mRNA) transcripts, depending both on the gene and the source tissue. Added to this biological complexity are the technical challenges of converting fragile mRNAs to the sturdier cDNAs. Standard methods involve use of poly dT as a primer on the 3' poly A end of purified mRNAs, with reverse transcriptase enzymes of viral origin polymerizing the synthesis of a single-stranded DNA complement of the mRNA. These initial DNA transcripts often fail to extend to the 5' end of longer mRNAs. With the use of more routine biochemistries, the single-stranded DNA is converted into duplex DNA and combined with a DNA vector to support its propagation and maintenance as a DNA clone. The double-stranded DNAs produced are much more stable and less susceptible to degradative processes than their single-stranded mRNA predecessors. However, because the initial reverse transcription is often shortened, cDNA libraries with abundant truncated products are the common result, particularly for the longer source mRNAs. Strategies devised for alleviating this truncation problem were described by Takao Isogai (Helix Research Institute, Japan), Nobuo Nomura (Kazusa DNA Research Institute, Japan), John Quackenbush [The Institute for Genomic Research (TIGR)], and M. Bento Soares (University of Iowa).

A protocol that takes advantage of the unusual nucleotide "cap" on the 5' end of mRNAs requires that the first cDNA strand’s extension be long enough to protect the cap as a contingency for final cDNA clone production. Soares reported, however, that about one-third of cDNA transcripts begin within the mRNA, as contrasted with preferred starts at the mRNA's 3' end, thus giving rise to3' truncations. This problem can be alleviated substantially by size fractionating the mRNAs and later selecting out the cDNA products with lengths equal to the size-sorted mRNA templates. Hans Lehrach (Max Planck Institut für Molekulare Genetik, Germany) related the value of massively parallel oligomer fingerprinting of cDNAs. This is an economical way to screen a library for novel and longer, potentially full-length cDNAs. Optimal candidate cDNAs chosen by the Lehrach team at the Resource Center of the German Genome Project are being sequenced in the laboratory of Annemarie Poustka (Deutsches Krebsforschungszentrum).

More than one sequencing read commonly is necessary to display the complete sequence for cDNAs longer than a few hundred bases. Strategies for economical full-length sequencing were discussed by Lennon and Richard Gibbs (Baylor College of Medicine). Sequence reads beyond 1000 bases now are being obtained with improvements to sequencing systems by Wilhelm Ansorge’s team at the European Molecular Biology Laboratory. Ansorge suggested that, for cDNAs shorter than 2 kb, good coverage could be achieved by two overlapping reads on complementary strands.

Giuseppe Borsani (Telethon Institute of Genetics and Medicine) reported on the benefits of the easily manipulated Drosophila model for studies of development and function to reveal roles represented by human cDNAs.

Mark Boguski (National Center for Biotechnology Information) discussed the status of the dbEST cDNA sequence database and made recommendations for the evolution needed to meet the impending new demands of complete DNA sequencing. He observed that each group will have its own selection criteria and sequencing priorities, such as finding cancer genes, genes with Drosophila homologs, or genes that already have been mapped.

Boguski coined the expression "the slicing problem" to describe the difficulties in avoiding undesirable duplication and redundancy due to overlapping choice categories. A possible solution would be to establish a registration and tracking database modeled after the successful European Bioinformatics Institute's (EBI) RHAlloc-RHdb approach used in constructing the human transcript map. Patricia Rodriguez-Tomé (EBI) has accepted this responsibility. This data will include an investigator or center name and contact information, identifiers for the physical cDNA clones being sequenced and associated EST accession numbers, and sequencing status. When participants registered a clone that they intended to sequence, the database would detect and report overlaps with clones selected by other groups.

Attendees agreed that the I.M.A.G.E. consortium should convene every 6 months to maintain necessary coordination and efficiency. A subsequent meeting, organized by Quackenbush, was held in September 1997 in conjunction with the Ninth International Genome Sequencing and Analysis Conference in Hilton Head, South Carolina. Washington University scientists will organize the next meeting, tentatively planned to concur with the May 1998 Human Genome Workshop at Cold Spring Harbor Laboratory. [Meeting: http://www.ornl.gov/meetings/wccs/index.shtml]
[Marvin Stodolsky, DOE Human Genome Program, and Denise Casey, HGMIS]

Editors' Note
Because of progress in the Human Genome Project, investigators have more tools to address the biological questions that prompted its establishment and are finding countless other applications in which genomic resources can be used. The convergence of new strategies and such resources as cDNA and clone libraries, databases, and automation and array technology has provided usable information since early in the project. Future applications in environmental molecular toxicology will help lead to understanding the links between genetic variation and environmentally influenced diseases. Several articles in this issue discuss some applications in environmental genomics.

Related Information

Baylor College of Medicine Human Genome Sequencing Center http://www.hgsc.bcm.tmc.edu/
Caltech Genome Research Laboratory
http://www.cegs.caltech.edu/index.html
Cancer Genome Anatomy Project
http://www.ncbi.nlm.nih.gov/CGAP/
dbEST Database
http://www.ncbi.nlm.nih.gov/dbEST/index.html
Deutsches Krebsforschungszentrum
Division of Molecular Genome Analysis
http://www.dkfz.de/en/mga/
Drosophila Related Expressed Sequences
http://www.tigem.it/LOCAL/drosophila/dros.html
European Molecular Biology Laboratory Ansorge Group Research Report
http://www.embl-heidelberg.de/ExternalInfo/ScientificProgrammes/Ansorge.html
German Human Genome Project Resource Center
http://www.rzpd.de
Helix Research Institute
http://www.hri.co.jp
Howard Hughes Medical Institute
http://www.hhmi.org
Human Transcript Map
http://www.ncbi.nlm.nih.gov/science96
I.M.A.G.E. http://www.imageconsortium.org/
Kazusa DNA Research Institute http://www.kazusa.or.jp/e/
Mouse cDNA Resources
http://www.imageconsortium.org/?s=6
NIH National Human Genome Research Institute
http://www.nhgri.nih.gov
The Institute for Genomic Research
http://www.tigr.org
WashU-HHMI Mouse EST Project
http://www.imageconsortium.org/?i=projects
WashU-Merck Human EST Project
http://www.imageconsortium.org/?i=projects

Back to Table of Contents

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v9n1).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.

Human Genome Project Information Archive
1990–2003

Groups Coordinate Gene Sequencing

Human Genome Project 1990–2003

Human Genome News

Citation and Credit

Archive Site Provided for Historical Purposes Only

Search the Site

Human Genome Project Information Archive1990–2003

Groups Coordinate Gene Sequencing

Human Genome Project 1990–2003

Human Genome News

Citation and Credit

Archive Site Provided for Historical Purposes Only

Search the Site

Human Genome Project Information Archive
1990–2003