From RNA-Seq to gene annotation using the splicegrapher method
Date
2013
Authors
Rogers, Mark F., author
Ben-Hur, Asa, advisor
Boucher, Christina, committee member
Anderson, Charles, committee member
Reddy, Anireddy S. N., committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Messenger RNA (mRNA) plays a central role in carrying out the instructions encoded in a gene. A gene's components may be combined in various ways to generate a diverse range of mRNA molecules, or transcripts, through a process called alternative splicing (AS). This allows each gene to produce different products under different conditions, such as different stages of development or in different tissues. Researchers can study the diverse set of transcripts a gene produces by sequencing its mRNA. The latest sequencing technology produces millions of short sequence reads (RNA-Seq) from mRNA transcripts, providing researchers with unprecedented opportunities to assess how genetic instructions change under different conditions. It is relatively inexpensive and easy to obtain these reads, but one limitation has been the lack of versatile methods to analyze the data. Most methods attempt to predict complete mRNA transcripts from patterns of RNA-Seq reads ascribed to a particular gene, but the short length of these reads makes transcript prediction problematic. We present a method, called SpliceGrapherXT, that takes a different approach by predicting splice graphs that capture in a single structure all the ways in which a gene's components may be assembled. Whereas other methods make predictions primarily from RNA-Seq evidence, SpliceGrapherXT uses gene annotations describing known transcripts to guide its predictions. We show that this approach allows SpliceGrapherXT to make predictions that encapsulate gene architectures more accurately than other state-of-the-art methods. This accuracy is crucial not only for updating gene annotations, but our splice graph predictions can contribute to more accurate transcript predictions as well. Finally we demonstrate that by using SpliceGrapherXT to assess AS on a genome-wide scale, we can gain new insights into the ways that specific genes and environmental conditions may impact an organism's transcriptome. SpliceGrapherXT is available for download at http://splicegrapher.sourceforge.net.
Description
Rights Access
Subject
RNA-seq
machine learning
support vector machine
computational biology
inference