Analysis of Human cDNAs Encoding Large Proteins: Towards and Beyond the Identification of Protein-Coding Transcribed Sequences

Takahiro Nagase
Department of Human Gene Research
Kazusa DNA Research Institute
1532-3, Yana, Kisarazu
Chiba 292-0812, Japan
telephone: +81-438-52-3930
fax: +81-438-52-3931
email: nagase@kazusa.or.jp
prestype: Platform
presenter: Takahiro Nagase

Takahiro Nagase, Reiko Kikuno, and Osamu Ohara
Department of Human Gene Research, Kazusa DNA Research Institute

Over the past six years, we have been studying the protein-coding sequences of unidentified human genes. In particular, our recent analyses have been focused on previously unidentified genes that encode large proteins (>50 kDa) in the human brain because large proteins are known to be frequently involved in various important cellular processes. To achieve this, we have sequenced the full length of large cDNAs (>4 kb) selected according to novel sequences at their 5' and 3' ends and their protein-coding potentials. These newly identified genes were systematically designated KIAA plus a four-digit number. The number of cDNA sequences determined has reached 2000 and their average size is approximately 5 kb. Thus, nearly 10 Mb of human cloned cDNA sequence has been determined with high accuracy. Since the number of genes encoding large proteins is expected to be only about 10% of the total number of human genes, the number of KIAA genes in the public databases (1642 entries, August 2000) is quite significant when it is considered that this represents a set of genes expressed in the brain. We also have analyzed chromosomal loci and expression profiles of many of these KIAA genes, and have made all the data accessible at our Web site (http://www.kazusa.or.jp/huge). From our experience in human cDNA sequencing over the past six years, we have learnt how to deal with the many different problems that arise in cDNA analysis. As the human genome sequencing project enters the last phase in which the draft sequences are finalized, cDNA sequence data will serve as an important tool for the interpretation of the sequence of the human genome. Furthermore, the cDNA data will offer a variety of information regarding post-transcriptional events, such as alternative splicing and RNA editing, which cannot be predicted from the genome sequence at present. On the other hand, the genome sequence can help considerably with the resolution of problems in cDNA technology, most of which originate from the fact that cDNAs are nothing but artificial copies of mRNAs. Therefore, integration of the genomic and cDNA data should be an urgent and critical concern. The ultimate goal of our project goes beyond the identification of protein-coding sequences in the human genome, as we believe that cDNA analysis will play a key role in bridging gaps in understanding between the genome, the transcriptome, and the proteome.



  Abstract List


Abstracts * Speakers * Organizers * Home


Genetic Meetings