Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, July 1994; 6(2)
Major accomplishments achieved in two separate sequencing projects recently yielded the longest contiguous stretch of DNA sequence on record and the largest comparative sequence analysis of a biologically important region in humans. These results demonstrate the feasibility of large-scale sequencing projects and strongly support the value of whole-genome sequencing and comparative analysis of model organism and human sequence to identify human genes and provide insights into their organization, regulation, and function.
Investigators led by Richard Wilson and Robert Waterston (Washington University, St. Louis) and John Sulston (Medical Research Council, Cambridge, U.K.) sequenced almost 2.2 Mb of the Caenorhabditis elegans genome [Nature 368 , 32-38 (1994)].
Researchers led by Ben Koop (University of Victoria) and Leroy Hood (University of Washington, Seattle) completed the sequence and comparative analysis of nearly 100 kb each of contiguous DNA from human and mouse genomic regions encoding T-cell receptors (TCRs) [Nature Genetics 7 , 48-53 (1994)]. TCRs are cell surface molecules that play an important role in mammalian cellular immunity.
The C. elegans work was supported by the NIH National Center for Human Genome Research (NCHGR) and the U.K. Medical Research Council Human Genome Mapping Project. TCR analyses were funded by NCHGR and DOE genome grants to Hood and by a National Science and Engineering Research Council (Canada) operating grant to Koop, who began this work as a DOE Human Genome Distinguished Postdoctoral Fellow with Hood. Details of the two projects follow.
C. elegans Sequence
Wilson and colleagues reported on the first 3 years of their effort to determine the sequence of the 100-Mb C. elegans genome, which is slightly smaller than an average human chromosome. This project, made possible by years of intensive research that produced detailed genetic and physical maps of the six C. elegans chromosomes, is considered an important testing ground for sequencing human DNA on a large scale. Each half of the research consortium completed over 1 Mb of sequence from chromosome III, roughly 2% of the genome, and all sequences have been deposited in the publicly available C. elegans database, ACEDB.
The finished sequence is based on analysis of cosmid clones mapped to the chromosome by restriction digest fingerprinting and includes two 1-Mb cosmid contigs bridged by a yeast artificial chromosome (YAC) clone, with a 92-kb cosmid contig near the center of the YAC bridge [see HGN 4 (2) 1-2 (May 1992)]. DNA templates for walking were obtained from 600 to 800 random phagemid and M13 subclones. After this initial random phase, site-specific oligonucleotide primers were used to extend sequences [see HGN 4 (5) 1-2 (January 1993)]. Researchers plan to use the same strategy to complete the C. elegans sequence.
The most striking result reported was the high number (483) of predicted genes identified by similarity searches and GENEFINDER analysis, with about 48% of the 2.2-Mb region representing putative exons and introns. Based on the number of tagged cDNAs that hit candidate genes in the sequence, the gene count for the entire genome has been revised upward to about 17,800, with one gene every 5.6 kb. Previous estimates based on classical genetic and mutation analysis methods predicted a total of only around 5000 genes. Many of the newly identified genes may be used as probes to reveal human counterparts, including heretofore unknown genes as well as human coding sequences already placed in databases.
Researchers projected that, with continuing technological improvements, each half of the consortium will be able to produce more than 10 Mb of finished sequence annually. At that rate, the C. elegans project could be completed by 1998. The consortium is contributing resources to laboratories involved in sequencing the Saccharomyces cerevisiae genome, and technology refinements will speed the progress of other sequencing projects as well.
Koop and Hood sequenced and analyzed nearly 100 kb of contiguous sequence from nonvariable regions of the TCRa complexes in the human and mouse genomes. The goal of the project is to sequence and compare 5 to 6 Mb from these regions.
TCRs play a central role in regulating the mammalian cellular immune response. These glycoprotein molecules are embedded in the surfaces of T cells, where they recognize and bind to foreign protein fragments captured by a cell surface molecule that is part of the major histocompatibility complex (MHC). The fragments are produced by foreign substances such as viruses or bacteria (for more details on TCR-MHC interactions ).
Formation of the TCR-MHC--foreign protein fragment complex can stimulate target cell destruction by T cells or antibody production by B cells. Researchers believe inappropriate T-cell responses are the culprits in allergies and several different types of autoimmune diseases, such as arthritis and diabetes. Elucidation of TCR structures and their functions will yield insights into the regulation of the immune response.
Individual TCRs are made of two polypeptide chains, of which there are four different components (designated alpha, beta, gamma, delta). Each component contains (1) a variable (V) region that is different for each receptor and responsible for specific recognition of foreign proteins and (2) a relatively invariant, constant (C) region for cell-surface attachment and other functions. The four components are encoded at three chromosomal loci in both the mouse and human genomes. Koop and Hood reported sequencing the Cdelta to Calpha region of the alpha and delta TCR loci.
A comparison of sequences in this region revealed a high degree of similarity between corresponding mouse and human protein-coding and noncoding regions. These results suggest that the majority of the TCR region has been highly conserved throughout 80 million years of evolution, although only about 6% of the region contains gene-coding sequences. Until recently, many scientists believed that only 3% of the genome contained useful sequences that were embedded in vast stretches of noncoding "junk" DNA. Recent studies are challenging that view in favor of seeing chromosomes as information organelles with complex structural and gene-control systems.
[Denise K. Casey, HGMIS]
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v6n2).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.