Discovery of 342 putative new genes from the analysis of 5'-end-sequenced full-length-enriched cDNA human transcripts.

Genomics

PubMedID: 15885500

Dalla E, Mignone F, Verardo R, Marchionni L, Marzinotto S, Lazarevic D, Reid JF, Marzio R, Klaric E, Licastro D, Marcuzzi G, Gambetta R, Pierotti MA, Pesole G, Schneider C. Discovery of 342 putative new genes from the analysis of 5'-end-sequenced full-length-enriched cDNA human transcripts. Genomics. 2005;85(6):739-51.
In this work we describe the process that, starting with the production of human full-length-enriched cDNA libraries using the CAP-Trapper method, led us to the discovery of 342 putative new human genes. Twenty-three thousand full-length-enriched clones, obtained from various cell lines and tissues in different developmental stages, were 5'-end sequenced, allowing the identification of a pool of 5300 unique cDNAs. By comparing these sequences to various human and vertebrate nucleotide databases we found that about 40% of our clones extended previously annotated 5' ends, 662 clones were likely to represent splice variants of known genes, and finally 342 clones remained unknown, with no or poor functional annotation. cDNA-microarray gene expression analysis showed that 260 of 342 unknown clones are expressed in at least one cell line and/or tissue. Further analysis of their sequences and the corresponding genomic locations allowed us to conclude that most of them represent potential novel genes, with only a small fraction having protein-coding potential.