Machine learning models towards elucidating the plant intron retention code
dc.contributor.author | Sneham, Swapnil, author | |
dc.contributor.author | Ben-Hur, Asa, advisor | |
dc.contributor.author | Chitsaz, Hamidreza, committee member | |
dc.contributor.author | Peterson, Christopher, committee member | |
dc.date.accessioned | 2018-01-17T16:45:41Z | |
dc.date.available | 2018-01-17T16:45:41Z | |
dc.date.issued | 2017 | |
dc.description.abstract | Alternative Splicing is a process that allows a single gene to encode multiple proteins. Intron Retention (IR) is a type of alternative splicing which is mainly prevalent in plants, but has been shown to regulate gene expression in various organisms and is often involved in rare human diseases. Despite its important role, not much research has been done to understand IR. The motivation behind this research work is to better understand IR and how it is regulated by various biological factors. We designed a combination of 137 features, forming an "intron retention code", to reveal the factors that contribute to IR. Using random forest and support vector machine classifiers, we show the usefulness of these features for the task of predicting whether an intron is subject to IR or not. An analysis of the top-ranking features for this task reveals a high level of similarity of the most predictive features across the three plant species, demonstrating the conservation of the factors that determine IR. We also found a high level of similarity to the top features contributing to IR in mammals. The task of predicting the response to drought stress proved more difficult, with lower levels of accuracy and lower levels of similarity across species, suggesting that additional features need to be considered for predicting condition-specific IR. | |
dc.format.medium | born digital | |
dc.format.medium | masters theses | |
dc.identifier | Sneham_colostate_0053N_14484.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/185669 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2000-2019 | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | intron retention | |
dc.subject | random forest | |
dc.subject | alternative splicing | |
dc.subject | SVM | |
dc.subject | machine learning | |
dc.title | Machine learning models towards elucidating the plant intron retention code | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Sneham_colostate_0053N_14484.pdf
- Size:
- 1.45 MB
- Format:
- Adobe Portable Document Format