Sponsored by the U.S. Department of Energy Human Genome Program
Human Genome News Archive Edition
Human Genome News, January-June 1997; 8:(3-4)
At the January 1997 Conference on Small Genomes: Sequencing, Functional Characterization, and Comparative Genomics, over 250 participants gathered on Hilton Head Island to discuss recent progress and future directions in this emerging and exciting area of research. As stated by Craig Venter [The Institute for Genomic Research (TIGR)] in the opening session, small-genome research is growing exponentially, and a new era of biological insight is emerging because of it. This first meeting on small genomes was sponsored by TIGR and organized by Claire Fraser (TIGR), Hamilton O. Smith (Johns Hopkins University Medical School), and E. Richard Moxon (Oxford University). Selected meeting highlights follow.
Plasmodium falciparum, the organism causing 300 million to 500 million new cases of malaria each year, contains over 5000 genes distributed among 14 chromosomes. As discussed by Stephen L. Hoffman (Naval Medical Research Institute), less than 5% of this organism's DNA is cloned at the present time. A consortium of sponsors including The Wellcome Trust, Burroughs Wellcome Fund, U.S. Department of Defense, and the NIH National Institute for Allergy and Infectious Diseases has joined to sequence this pathogen, many strains of which have become resistant to chloroquine, the only effective drug for treating malaria. The hope is that new targets for different drugs will be revealed by the complete sequence of the plasmodial genome. Cloning and sequencing the plasmodium DNA, however, is proving difficult because ofthe organism's high AT content (as much as 76% in coding regions and 90% to 100% in intergenic regions). Further, P. falciparum's DNA has proven unstable in Escherichia coli, and the organism itself is difficult to maintain in culture.
Several interesting characteristics of Helicobacter pylori, which resides in mucosa or apposed to epithelia and causes gastric and duodenal ulcers in humans, were reported by Jean-Francois Tomb (TIGR) and Douglas E. Berg (Washington University Medical School). H. pylori contains over 1500 open reading frames (ORFs), with many repetitive sequences. A large porin-like family of cell adhesion molecules may be involved in mucosal or epithelial cell adhesion. Most genes for flagellar structure and function are present. Dinucleotide repeats may affect the ORF for some surface proteins and provide a possible mechanism for eluding the host immune system. There is much variation among Helicobacter strains. Different variants predominate in members of the same host species, which may reflect spontaneous mutation, horizontal gene transfer, and selection by different hosts over a long interval of infection. A challenge for the future will be to identify the function of unknown or "orphan" genes in H. pylori and determine how different genes allow this organism to live in a relatively hostile acid environment, causing active disease in some cases but not others.
Bernard Dujon (Institut Pasteur) presented some highlights of the recently completed yeast genome sequence. Three signature features are apparent in the more than 6000 ORFs of this organism: A large number of genes have unknown function, the genome contains many redundant sequences, and about 10% to 15% of the ORFs are thought to be essential for life. Work is proceeding to assign functions to orphan ORFs by using deletion and other mutants to examine effects on expression in different vector systems. Two hybrid systems are being used to search for relationships and interactions with other yeast gene products.
Satoshi Tabata (Kazusa DNA Research Institute, Japan) reported that 3,573,470bp of DNA specify more than 3100 potential ORFs in the cyanobacterium Synechocystis. This single-celled organism is photoautotrophic and capable of oxygenic photosynthesis. Synechocystis contains more than 126 genes related to photosynthesis, and about 90% of algal plastid genes appear to be conserved in Synechocystis.
Richard Hermann (University of Heidelberg) reported on the closely related bacteria Mycoplasma genitalium (580,070bp) and Mycoplasma pneumoniae (816,394bp). All the ORFs in M. genitalium are present in M. pneumoniae, whose genome consists of six segments. Segment order is not conserved between the two species, but gene order is conserved within each segment. Differences in genome size could have evolved by deletion of M. genitalium genes that are not essential for life outside the host and by gene amplification in M. pneumoniae.
Bacterial virulence factors are often encoded in extrachromosomal plasmids, phages, and transposons that can be transmitted horizontally between different species or strains. Pathogenic Vibrio cholera, for example, carries a phage kappa. The phage has a gene for Glo, a virulence factor very similar to a small eukaryotic G protein. Present on the chromosomal DNA of toxigenic strains are cholera toxin–encoding genes similar to the genome of a filamentous bacteriophage. The cholera toxin and pilus are regulated coordinately by a transcriptional regulator called Tox R. In fact, the pilus is the receptor for the phage (John Mekalonos, Harvard Medical School).
As reported by Brian G. Spratt (University of Sussex, U.K.), both inter- and intraspecies recombination can occur in bacteria, particularly when the bacteria coexist in an environment such as the nasopharynx. In these cases, Spratt noted, recombination occurs more frequently in housekeeping genes and in genes under strong selection.
Analysis of Gene-Product Function
Several groups are attempting to analyze the function of gene products specified by ORFs of completely sequenced organisms. Richard Moxon and his collaborators have identified and cloned 25 genes involved in the biosynthesis and regulation of the lipopolysaccharide (LPS) of Haemophilus influenzae. LPS is a major virulence determinant for this human pathogen. Analysis of these genes and their mutants with monoclonal antibodies, polyacrylamide gel electrophoresis, and mass spectrometry has, in fact, confirmed a role for most of these proteins in strain pathogenicity.
H. influenzae has an estimated 1700 ORFs. Only about 500 polypeptides, however, can be detected by Coomassie blue staining following separation of cell extracts by 2-D polyacrylamide gel electrophoresis. About 650 polypeptides can be detected by autoradiography after biosynthetic labeling of cells with 35S-methionine before electrophoresis. The effects of protein inhibitors and RNA synthesis on these resolved polypeptides were discussed by Stefan Evers (Hoffmann-LaRoche). Responses to different inhibitors were similar for some resolved proteins. For example, up-regulation occurred for some transcription and translation bacterial components, including ribosomal proteins and RNA polymerase. Most puzzling is the observation that, in many cases, a change in gene transcriptional level did not correspond to a change in translation rate.
Ian Humphery-Smith (National Innovation Center, Australia) discussed limitations of the 2-D polyacrylamide gel approach to protein resolution and analysis. Improved methods are needed for extracting protein from cells, fractionating and enriching protein classes, and conducting 2-D polyacrylamide gel electrophoresis, especially when providing a larger separation area to obtain better resolution. Better methods to detect resolved polypeptides by using mass spectrometry and nanoelectrospray mass spectrometry are under development.
Both Monica Riley (Marine Biological Laboratory, Woods Hole) and Peter Karp (Artificial Intelligence Center, SRI International) spoke about progress and problems in assigning functional roles to ORFs identified by complete genome sequence analysis. Sequence-similarity analyses of amino acid residues aligning 100 to 200 E.coli residues showed that all the proteins can be grouped in families ranging in size from 2 to more than 60 members. Interestingly, not all proteins performing a similar function have sequence similarity, emphasizing the challenge of assigning a specific function to a protein from its deduced amino acid sequence. Current methods are being improved to place a protein sequence deduced from an ORF into a metabolic pathway by using databases that describe the genes and intermediary metabolism of E. coli and H. influenzae. Use of these databases should lead to a better and more reliable system for identifying biological function during annotation.
As discussed by Hamilton Smith, nongenic information between ORFs may be as interesting and important as the genes themselves, but these sequences often would be overlooked during annotation. For example, a 34-amino acid ORF is oriented oppositely to two flanking H. influenzae genes and has a strong promoter and ribosome-binding site. Additional experimentation is needed to determine this gene's function and to identify similar genes.
Although a large number of microorganisms will be sequenced in the near future, they will represent a minute sample of biological diversity on earth. Only a very small fraction of living organisms can be cultivated, cloned, and grown in a defined laboratory environment, 0.001% to 0.1%, as estimated by Norman R. Pace (University of California, Berkeley). However, such newer techniques as PCR, gene cloning, sequencing of amplified products, and ribosomal RNA typing will allow a survey of different organisms in their natural environment, thus eliminating the need for laboratory cultivation. Similar techniques are being used to examine biological diversity among the Archaea (Edward F. DeLong, University of California, Santa Barbara). Initial results indicate that the microbial world has just begun to be appreciated and that much new information and many surprises in the biochemical, genetic, metabolic, physiological, and evolutionary realms will be forthcoming from studies of this most abundant and diverse group of organisms.
The second annual Small Genomes meeting will be held at Hilton Head, South Carolina, from January 31 to February 5, 1998.
[Darrell Doyle, TIGR]
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v8n3).
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.