A large scale proteogenomics study of apicomplexan pathogens - Toxoplasma gondii and Neospora caninum.

Proteomics

PubMedID: 25867681

Krishna R, Xia D, Sanderson S, Shanmugasundram A, Vermont S, Bernal A, Daniel-Naguib G, Ghali F, Brunk B, Roos D, Wastling JM, Jones AR. A large scale proteogenomics study of apicomplexan pathogens - Toxoplasma gondii and Neospora caninum. Proteomics. 2015;.
Proteomics data can supplement genome annotation efforts, for example being used to confirm gene models or correct gene annotation errors. Here we present a large scale proteogenomics study of two important apicomplexan pathogens: Toxoplasma gondii and Neospora caninum. We queried proteomics data against a panel of official and alternate gene models generated directly from RNASeq data, using several newly generated and some previously published MS data sets for this meta-analysis. We identified a total of 201996 and 39953 peptide-spectrum matches (PSMs) for T. gondii and N. caninum respectively at a 1% peptide false discovery rate threshold. This equated to the identification of 30494 distinct peptide sequences and 2921 proteins (matches to official gene models) for T. gondii, and 8911 peptides / 1273 proteins for N. caninum following stringent protein-level thresholding. We have also identified 289 and 140 loci for T. gondii and N. caninum respectively which mapped to RNA-Seq derived gene models used in our analysis and apparently absent from the official annotation (release 10 from EuPathDB) of these species. We present several examples in our study where the RNA-Seq evidence can help in correction of the current gene model and can help in discovery of potential new genes. The findings of this study have been integrated into the EuPathDB. The data have been deposited to the ProteomeXchange with identifiers PXD000297and PXD000298. This article is protected by copyright. All rights reserved.