Human DNA Sequencing
HGP planners at DOE and NIH emphasized that the highest priority of the Human Genome Project remains the same: to obtain and make publicly available a complete and highly accurate reference sequence. The new projected completion date was credited to recent advances achieved in technology and experience with pilot large-scale efforts as well as to the contributions of international partners in the sequencing effort, notably those of the Sanger Centre in the U.K., and research centers in Germany, Japan, and France.
NIH and DOE sequencing centers expect to generate about 60-70% of the human DNA sequence, which will be made available broadly and rapidly via the web to stimulate future research.
A new sequencing milestone expected by 2001 as a result of the scale-up is the generation of a "working draft" of 90% the genome that comprises shotgun sequence data from mapped clones, with gaps and ambiguities unresolved. If the data sets can be merged, private-sector sequencing efforts may help increase the depth of the mapped draft, which scientists expect will contain about half of the genes.
The continued emphasis on obtaining highly accurate sequence (1 error in 10,000 bases) that is largely continuous (few gaps) across each human chromosome, and the development of sustainable sequencing capacity underscores the critical importance of these resources for understanding human biology and for applications to other fields.
Achieving the HGP goal will require current sequencing capacity to be expanded 2-3 times, demanding further incremental advances in standard sequencing technologies and improvements in efficiency and cost. For future sequencing applications, planners emphasize the importance of supporting novel technologies that may be 5-10 years in development.
A new priority for the HGP is examining regions of natural variation that occur among genomes (except those of identical twins). Goals specify development of methods to detect different types of variation, particularly the most common type called single nucleotide polymorphisms (SNPs) that occur about once every 1000 bases. Scientists believe SNP maps will help them identify genes associated with complex diseases such as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to make using conventional gene hunting methods because any individual gene may make only a small contribution to disease risk. DNA sequence variations also underlie many individual differences in responses to the environment and treatments.
Efficient interpretation of the functions of human genes and other DNA sequences requires developing the resources and strategies to enable large-scale investigations across whole genomes. A technically challenging first priority is to generate complete sets of full-length cDNA clones and sequences for human and model organism genes. Other functional genomics goals include studies into gene expression and control, creation of mutations that cause loss or alteration of function in nonhuman organisms, and development of experimental and computational methods for protein analyses.
A first clue toward identifying and understanding the functions of human genes or other DNA regions is often obtained by studying their parallels in nonhuman genomes. To enable efficient comparisons, complete genomic sequences already have been obtained for the bacterium E. coli and the yeast S. cerevisiae, and work continues on sequencing the genomes of the roundworm, fruit fly, and mouse. Planners note that other genomes will need to be sequenced to realize the full promise of comparative genomics, stressing the need to build a sustainable sequencing capacity.
Ethical, Legal, and Social Implications (ELSI)
Rapid advances in genetics and applications present new and complex ethical and policy issues for individuals and society. ELSI programs that identify and address these implications have been an integral part of the US HGP since its inception. These programs have resulted in a body of work that promotes education and helps guide the conduct of genetic research and the development of related health professional and public policies.
Continuing and new challenges include safeguarding the privacy of individuals and groups who contribute samples for large-scale sequence variation studies; anticipating how resulting data may affect concepts of race and ethnicity; identifying how genetic data could potentially be used in workplaces, schools, and courts; commercial uses; and the impact of genetic advances on concepts of humanity and personal responsibility.
Bioinformatics and Computational Biology
Continued investment in current and new databases and analytical tools is critical to the success of the Human Genome Project and to the future usefulness of the data. Databases must be structured to adapt to the evolving needs of the scientific community and allow queries to be answered easily. Planners suggest developing a human genome database analogous to model organism databases with links to phenotypic information. Also needed are databases and analytical tools for the expanding body of gene expression and function data, for modeling complex biological networks and interactions, and for collecting and analyzing sequence variation data.
Planners note that future genomics scientists will require training in interdisciplinary areas that include biology, computer science, engineering, mathematics, physics, and chemistry. Additionally, scientists with management skills will be needed for leading large data-production efforts.
The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.
Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.