Deep learning for bioinformatics sequences: RNA basecalling and protein interactions

Neumann, Don, author; Ben-Hur, Asa, advisor; Beveridge, Ross, committee member; Blanchard, Nathaniel, committee member; Reddy, Anireddy, committee member

Deep learning for bioinformatics sequences: RNA basecalling and protein interactions

Files

Neumann_colostate_0053A_18230.pdf (1.69 MB)

Date

2024

Authors

Neumann, Don, author

Ben-Hur, Asa, advisor

Beveridge, Ross, committee member

Blanchard, Nathaniel, committee member

Reddy, Anireddy, committee member

Abstract

In the interdisciplinary field of bioinformatics, sequence data for biological problems comes in many different forms. This ranges from proteins, to RNA, to the ionic current for a strand of nucleotides from an Oxford Nanopore Technologies sequencing device. This data can be used to elucidate the fundamentals of biological processes on many levels, which can help humanity with everything from drug design to curing disease. All of our research focuses on biological problems encoded as sequences. The main focus of our research involves Oxford Nanopore Technology sequencing devices which are capable of directly sequencing long read RNA strands as is. We first concentrate on improving the basecalling accuracy for RNA, and have published a paper with a novel architecture achieving state-of-the-art performance. The basecalling architecture uses convolutional blocks, each with progressively larger kernel sizes which improves accuracy for the noisy nature of the data. We then describe ongoing research into the detection of post-transcriptional RNA modifications in nanopore sequencing data. Building on our basecalling research, we are able to discern modifications with read level resolution. Our work will facilitate research into the detection of N6-methyladeosine (m6A) while also furthering progress in the detection of other post-transcriptional modifications. Finally, we recount our recently accepted paper regarding protein-protein and host-pathogen interaction prediction. We performed experiments demonstrating faulty experimental design for interaction prediction which have plagued the field, giving the faulty impression the problem has been solved. We then provide reasoning and recommendations for future work.

Subject

post transcriptional modifications

RNA basecalling

protein protein interactions

deep learning

URI

https://hdl.handle.net/10217/238479
https://doi.org/10.25675/3.04083

Collections

2020-
Theses and Dissertations

Full item page

Deep learning for bioinformatics sequences: RNA basecalling and protein interactions

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By