Sneham, Swapnil, authorBen-Hur, Asa, advisorChitsaz, Hamidreza, committee memberPeterson, Christopher, committee member2018-01-172018-01-172017https://hdl.handle.net/10217/185669Alternative Splicing is a process that allows a single gene to encode multiple proteins. Intron Retention (IR) is a type of alternative splicing which is mainly prevalent in plants, but has been shown to regulate gene expression in various organisms and is often involved in rare human diseases. Despite its important role, not much research has been done to understand IR. The motivation behind this research work is to better understand IR and how it is regulated by various biological factors. We designed a combination of 137 features, forming an "intron retention code", to reveal the factors that contribute to IR. Using random forest and support vector machine classifiers, we show the usefulness of these features for the task of predicting whether an intron is subject to IR or not. An analysis of the top-ranking features for this task reveals a high level of similarity of the most predictive features across the three plant species, demonstrating the conservation of the factors that determine IR. We also found a high level of similarity to the top features contributing to IR in mammals. The task of predicting the response to drought stress proved more difficult, with lower levels of accuracy and lower levels of similarity across species, suggesting that additional features need to be considered for predicting condition-specific IR.born digitalmasters thesesengCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.intron retentionrandom forestalternative splicingSVMmachine learningMachine learning models towards elucidating the plant intron retention codeText