Department of Biochemistry and Molecular Biology
London WC1E 6BT, UK
telephone: 00 44 020 7419 3284
fax: 00 44 020 7380 7193
presenter: Christine Orengo
Orengo, C.A, Pearl, F., Lee, D., Bray, J., Todd, A., Sillitoe, I. & Thornton,
Biomolecular Structure and Modelling Unit, University College, London
The rapid progress of the international genome projects has resulted in a
wealth of sequence information for protein families. Although the structural
data has lagged behind the sequences, analysis suggests that with the advent
of structural genomics initiatives we may soon have structural representatives
for many evolutionary protein families
At UCL, we have clustered all the well-resolved protein structures, in the
PDB, into structural families. Proteins are first divided into separate domains
and both sequence and structure alignment methods used to identify relationships.
Data on families is stored within the CATH database (Class, Architecture, Topology
or fold and Homologous superfamily). To date, there are ~26,000 domains within
CATH, which cluster into ~1000 homologous superfamilies and ~600 fold groups.
Recently, profile based methods (PSIBLAST, Altschul et al. 1997) have been used
for identifying 160,000 sequence relatives in the genomes and integrating these
into CATH families, using conservative thresholds.
Functional analysis has been undertaken within each superfamily in CATH, using protein ligand interaction plots (DOMPLOT, Todd et al. 1998). Correlations between sequence and structure motifs are captured using a new dictionary of functional information (DHS, Bray et al, 1999). Analysis of CATH enzyme superfamilies reveals that function is completely conserved in about one third whilst in a further quarter the catalytic mechanism is conserved though the substrate or ligand may vary. In some superfamilies function was observed to vary widely.
Abstracts * Speakers * Organizers * Home