Using the transcriptome to annotate the genome revisited: application of massively parallel signature sequencing (MPSS)
Abstract
Transcriptome analysis can provide useful data for refining genome sequence annotation. Application of massively parallel signature sequencing (MPSS) revealed reproducible transcription, in multiple MPSS cycles, from 73% of computationally predicted genes in the Theileria parva schizont lifecycle stage. Signatures spanning consecutive exons confirmed 142 predicted introns. MPSS identified 83 putative genes, > 100 codons overlooked by annotation software, and 139 potentially incorrect gene models (with either truncated ORFs or overlooked exons) by interfacing signature locations with stop codon maps. Twenty representative models were confirmed as likely to be incorrect using reverse transcription PCR amplification from independent schizont cDNA preparations. More than 50% of the 60 putative single copy genes in T. parva that were absent from the genome of the closely related T. annulata had MPSS signatures. This study illustrates the utility of MPSS for improving annotation of small, gene-rich microbial eukaryotic genomes.