Human Genome Project Information. Click to return to home page.

Policies on Release of Human Genomic Sequence Data
Bermuda-Quality Sequence

Two meetings were held in Bermuda in 1996 and 1997 to discuss how human sequence data should be released. The output of these meetings is discussed below. Meetings are listed in reverse chronological order.

Summary of the Report of the Second International Strategy Meeting on Human Genome Sequencing (Bermuda, 27th February - 2nd March, 1997) as reported by HUGO

Summary

  • The principles enunciated at the first International Strategy meeting, of rapid data release and public access to the primary genomic sequence, were reaffirmed.
  • Scientists and funding agencies should take the necessary steps to ensure that the principles are adhered to by all participating organisations.

Sequence Quality Standards

The following standards were agreed:

  • The nucleotide error rate should be 1 error in 10,000 bases or less for most sequence.
  • Assemblies should be verified by restriction digest using two or more restriction enzymes.
  • Gaps in sequence. The agreed long-term goal is no gaps, recognising that this is not yet routine.
  • Closing gaps is the responsibility of the original sequencer.

The following proposals were endorsed by the participants:

  • It was agreed that a useful trial to assess sequence accuracy would be to perform a data exchange exercise. Raw sequence data would be exchanged among sequencing centres, centres would reassemble the data and identify outright discrepancies or ambiguities with reference to the sequence submitted to the database. These would be resolved by further consultation or resequencing. The same data sets would be sent to two centres which would hopefully engender competition to detect errors.
  • All sequence reads should be archived in a retrievable form.
  • Sequencing centres should define explicitly how error rates and costs have been calculated.

Sequence Submission and Annotation

Sequence data should be classified simply as "finished" or "unfinished" and should be stored in distinct databases; consideration should be given to establishing a public database for unfinished sequence data.

Sequence annotation should be standardised if possible, and include the following information:

  • Error estimation such as PHRED AND PHRAP data.
  • Enzymes used to verify assemblies, and sizes of fragments produced.
  • Exact details on how to assemble adjacent clones, with a minimum of 100 bp of overlapping (preferably unique) sequence between clones for verification.
  • Gaps must be sized and the surrounding sequence oriented and ordered. The methods used for sizing and reasons for not closing the gap should be stated.
  • If features such as coding sequence and splice sites are included in the annotation, it should be stated if they were identified experimentally or by computer predictions.
  • Unfinished sequence; it should be stated how near the sequence is to completion.

Potential development of a database listing all gaps in 'finished' sequence.

Sequence Claims and Etiquette

Mapping investment does not automatically entitle sequencing claims over the same region until a sequence-ready map has been generated.

Potential conflicts with other sequencers to be resolved by early communication.

Collaborations with groups with a biological interest in a region should be subject to the same principles of data release and communication.

Investigate whether the Human Sequence Map Index should be relocated to be more closely associated with the other major human sequence databases.

Claims allowed on the Index:

  • Duration–maximum 1 year.
  • Size of region–minimum 1 Mb; regions to be defined by Généthon markers if possible, other agreed and available markers if not.
  • Maximum amount–in the order of three times the sequence released by the centre in the preceding year.
  • Sequence claims must span the entire region between, and including, the delimiting markers.


Summary of Principles Agreed Upon at the First International Strategy Meeting on Human Genome Sequencing (Bermuda, 25-28 February 1996) as reported by HUGO

The following principles were endorsed by all participants. These included officers from, and scientists supported by, the Wellcome Trust, the U.K. Medical Research Council, the NIH NCHGR (National Center for Human Genome Research) , the DOE (U.S. Department of Energy), the German Human Genome Programme, the European Commission, HUGO (Human Genome Organisation), and the Human Genome Project of Japan. It was noted that some centres may find it difficult to implement these principles because of legal constraints and it was, therefore, important that funding agencies were urged to foster these policies.

Primary Genomic Sequence Should be in the Public Domain

It was agreed that all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society.

Primary Genomic Sequence Should be Rapidly Released

  • Sequence assemblies should be released as soon as possible; in some centres, assemblies of greater than 1 Kb would be released automatically on a daily basis.
  • Finished annotated sequence should be submitted immediately to the public databases.

It was agreed that these principles should apply for all human genomic sequence generated by large-scale sequencing centres, funded for the public good, in order to prevent such centres establishing a privileged position in the exploitation and control of human sequence information.

Coordination

In order to promote coordination of activities, it was agreed that large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the genome. HUGO would present this information on their World Wide Web page and direct users to the Web pages of individual centres for more detailed information regarding the current status of sequencing in specific regions. This mechanism should enable centres to declare their intentions in a general framework while also allowing more detailed interrogation at the local level.


For More Information

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Human Genome News

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.