HGPI

Human Genome Project Information Archive
1990–2003

Archive Site Provided for Historical Purposes


Sponsored by the U.S. Department of Energy Human Genome Program

Human Genome News Archive Edition
go to list of issues »

Human Genome Quarterly, Spring 1989; 1(1)

Workshop Focuses on Interface Between Computational Science and Nucleic Acid Sequencing

Approximately 100 molecular biologists, computer scientists, mathematicians, and other scientists in diverse fields met in Santa Fe, New Mexico, on December 12-16, 1988, at a workshop on "The Interface Between Computational Science and Nucleic Acid Sequencing." Supported by the Department of Energy (DOE) Office of Health and Environmental Research (OHER), the workshop was largely motivated by the national human genome research community, which has as one of its goals the mapping of the human genome.

This activity will include the development of physical maps of each chromosome and, ultimately, may include determining the sequence of the three billion nucleotides that make up the human genome. The capture, organization, availability, and comprehension of this information will require the development and, in some cases, the invention of many computational tools. The workshop was organized to discuss:

  • the computational challenges posed by this information onslaught,
  • the current state of relevant databases and analysis methods, and
  • directions for needed research and development.

Although the workshop topics were given special urgency by the Human Genome Program, other "megasequencing" projects to sequence bacterial and yeast genomes, together with recent progress in sequencing technology, also made the workshop timely.

The workshop participants reviewed anticipated needs of the Human Genome and other programs, together with existing capabilities. At present, the human genetic linkage map is assembled by the Howard Hughes Medical Institute Human Gene Mapping Library, while the sequence data are assembled collaboratively by three groups:

  • GenBank (Los Alamos National Laboratory (LANL) and Intelligenetics, Mountain View, California),
  • European Molecular Biology Laboratory (EMBL) Data Library (Heidelberg), and
  • DNA Data Bank of Japan (Mishima).

A pilot project has been started at LANL to provide a national repository for physical mapping data of various resolutions. Workshop participants agreed that there is a need for image-processing and data-management systems that will enable individual laboratories not only to organize their own map and sequence data, but also to submit them directly to central databases. GenBank will soon appear in the form of a relational database, but some participants saw a need for object-oriented and hierarchical databases in the future. It was agreed that all databases need to be linked and easily accessible to individual investigators through their scientific workstations. The National Library of Medicine, through its Center for Biotechnology Information, and the Center for Human Genome Studies at LANL plan to play major roles in coordinating this effort.

Detection of functionally significant patterns in DNA and protein sequences was a major topic at the workshop. A standard procedure is to compare a new sequence with all known ones in a search for significant similarity. Reports were given on the use of parallel computers for such comparisons.

In addition, special-purpose computer chips are being designed for sequence comparisons. (T. Hunkapiller, California Institute of Technology). Some exciting results were reported by A. Lapedes (LANL) in the use of adaptive networks to detect protein coding regions, including the intron-exon splice junctions, in human DNA. Sequences with the potential to regulate gene expressions are more difficult to detect. Many of these are sequences that are recognized by specific proteins, and C. Benham (Mt. Sinai School of Medicine) gave an elegant review of how the partial untwisting of the DNA double helix may induce the formation of local structures, such as cruciforms at inverted repeats or left-handed helicity for alternating purine-pyrimidine sections, that can be recognized by proteins.

The classic problem of predicting pro-tein structure and function from sequence was discussed. Short of this goal, several approaches to predicting secondary structure from sequence were presented. R. Doolittle (University of California at San Diego) observed that approximately half of the protein sequences that are now being determined from nucleotide sequences are found to have significant similarities to other known proteins. A major issue concerned the extent to which exons correspond to functional domains of proteins.

In the past few years, many molecular biologists have come to regard computer databases and analysis programs as important components of their research; for some, as seen at this workshop, they are indispensable. Participants at this workshop were united in their belief in the importance of this field and in their appreciation for the meeting which provided communication and fostered collaborations in this highly interdisciplinary research.

The proceedings of the workshop will be published in Computers and DNA, edited by George I. Bell and published by Addison-Wesley (1-800-447-2226).


Submitted by Dr. George I. Bell
Los Alamos National Laboratory

Return to Table of Contents

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v1n1).

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.