Exceptional Chromosome Regions II



Segmental Duplications: Organization and Impact Within the Current Assembly

Jeffrey A. Bailey, Amy M. Yavor, Julie E. Horvath, Barbara Trask, and Evan E. Eichler
Department of Genetics and Center for Human Genetics, Case Western Reserve School of Medicine and University Hospitals of Cleveland, Cleveland, OH, 44106

Segmental duplications play fundamental roles in both genomic disease and gene evolution. Over the past year, we developed the computational tools and methods necessary to detect identity between long stretches of genomic sequence despite the presence of high copy repeats and large insertion-deletions. We focused our analysis on large recent duplication events that fell well-below levels of draft sequencing error (alignments 90-98% similar and >1 kb in length) revealing an unprecedented amount of duplicated sequence (3.6%) in the human draft assembly (oo15). Here we present a more refined analysis of the most recent genome assembly (oo23) in which we focus on the role duplications play in whole-genome assembly process. Duplications (90-98%; > 1 kb) comprise 3.6% of all sequence in oo23. These duplications show clustering and up to 10-fold enrichment within pericentromeric and subtelomeric regions. Despite this bias, complex regions of duplication have also been identified within gene rich regions. In terms of assembly, duplicated sequences are 6.7-fold over-represented in unordered and unassigned contigs indicating that duplicated sequences are difficult to assign to their proper position. Further, utilizing data from 134 sequence BACs with FISH signals to multiple chromosomes, only 57% (280/571) of chromosomes positive by FISH had a corresponding chromosomal BLAST hit to oo23. We present data that indicates that this is due to misassembly/misassignment and decreased sequencing coverage within duplicated regions. Suprisingly, if we consider putative duplications >98%, we identify 10.3% (286 Mb) of the current assembly as paralogous. At high similarities (>98%) 10.3% of the sequence is involved in pairwise alignments. The majority of these alignments, we believe, represent unmerged overlaps within unique regions. Taken together the above data indicates that segmental duplications represent a significant impediment to accurate human genome assembly, requiring the development of specialized techniques to finish these exceptional regions of the genome. Specific examples from chromosomes 16 and 19 will be presented.

Last modified: Wednesday, October 22, 2003

Base URL: www.ornl.gov/meetings/ecr2/

Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program