Uncovering the role of epigenetics in alternative splicing
Ullah, Fahad, author
Ben-Hur, Asa, advisor
Anderson, Charles, committee member
Chitsaz, Hamidreza, committee member
Reddy, Anireddy S. N., committee member
Alternative Splicing (AS) is a regulated phenomenon that enables a single gene to encode structurally and functionally different biomolecules (proteins, non-coding RNAs etc.), that play important roles in an organism's development and growth. Besides, it has been implicated in multiple diseases including cancer, thalassemia, and spinal muscular atrophy. Recent studies have shown that AS is widespread in both plants and animals. Moreover, it has been reported that splicing occurs co-transcriptionally and that chromatin state is important for understanding the regulation of AS. Most of the previous efforts made to elucidate the regulation of AS used sequence information alone. However, in this study our goal is to understand AS from an epigenetic perspective: how chromatin organization, accessibility, and modifications are involved in its regulation. Intron Retention (IR) is the most frequent form of AS in plants, however, very little is known about its regulation, particularly regarding the role of chromatin state. Therefore, as a first step, we investigate the relationship between IR and chromatin accessibility in two plant species: arabidopsis and rice. We report a strong association between chromatin accessibility and IR. Our findings suggest that chromatin is more open and accessible in IR. Furthermore, we discover motifs associated with the regulation of alternative and constitutively spliced introns, many of which match those of known transcription factors and are conserved between arabidopsis and rice, a strong indication of their functional importance. Recent studies have suggested that IR is highly prevalent in humans as well. Using the plethora of genomic data that is available in human, we design a deep learning model for predicting IR in regions of open chromatin. Our model exhibits good accuracy in terms of Area Under the ROC Curve (AUC), with median AUC = 0.80. Moreover, we identify motifs enriched in IR events with significant hits to known human transcription factors (TFs). The zinc finger family exhibits the highest activity in IR events, a prediction that is validated using ChIP-Seq data. Experiments by our collaborators have validated our predictions in several candidate IR events. Finally, as an effort to capture the complete regulatory landscape of alternative splicing, we investigate the cooperativity and interactions between regulatory sequence features. To that end, we design a self-attention model that combines convolutional and recurrent layers with a self-attention layer that helps us capture a global view of the landscape of interactions between regulatory elements in a sequence. We evaluate our method on several datasets and compare it to existing methodology. In each experiment, our model identifies numerous statistically significant TF interactions, many of which have been previously reported. Finally, using this model with the chromatin accessibility in IR dataset, we identify many interactions primarily involving the zinc finger family of transcription factors. Our approach not only provides a global, biologically relevant set of interactions but, unlike existing methods, it does not require a computationally expensive postprocessing step. In summary, this dissertation sheds light on the epigenetic regulation of alternative splicing by transcription factors, and also contributes methodologically by making the results of deep learning models more interpretable.
Includes bibliographical references.