Beyond the Identification of Transcribed Sequences: Functional and Expression Analysis

9th Annual Workshop, October 28-31, 1999

Co-sponsored by the U.S. Department of Energy


Annotation of the human X chromosome sequence

Gareth R. Howell, Alison J. Coffey, Susan Rhodes, Robert A. Brooksbank, Shirin S. Joseph, Jackie M. Bye, Andrew King, Laurens Wilming, David R. Bentley and Mark T. Ross

The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom

The Sanger Centre is sequencing approximately 110 Mb of the human X chromosome between DMD (Xp21.1) and DXS532 (Xq27.3). We have so far produced 35 Mb of finished sequence from this region, together with a further 11 Mb of unfinished. Annotation of transcribed regions begins with the analysis of finished sequence for nucleic acid and protein homologies and for the predicted presence of exons, genes and CpG islands. Analysis results are displayed graphically in acedb. Primer pairs are designed to strongly predicted exons and used to screen PCR pools of cDNA clones from a collection of tissues. Verification and extension of partial gene structures is effected using single-side specificity (SS)PCR within the positive cDNA pools. The sequences of the SSPCR products are added to the acedb display to illustrate the experimental confirmation of the genomic sequence predictions.

Most of our effort so far has been in Xq23-q27, where genomic sequencing is most advanced. Here we have so far identified and established at least partial structures for 45 genes, including the SH2D1A gene, mutations of which result in X-linked lymphoproliferative disease (XLP). The sequence annotated with these gene structures is made available using webace, a web-based version of the acedb display. Further details can be obtained from our X chromosome WWW page at

For part of this region (Xq25) we have initiated a small scale comparative sequencing project in the syntenic region of the mouse X chromosome. The region contains at least four genes in human (CXorf3, XIAP, SH2D1A and Tenascin-M). The availability of the mouse sequence should provide us with valuable information on the power of comparative sequencing for confirmation of complete gene structures, detection of genes not found using the approach described above, and identification of potential gene regulatory elements. 

Return to Table of Contents