Deep learning for bioinformatics sequences: RNA basecalling and protein interactions

Neumann, Don, author; Ben-Hur, Asa, advisor; Beveridge, Ross, committee member; Blanchard, Nathaniel, committee member; Reddy, Anireddy, committee member

Deep learning for bioinformatics sequences: RNA basecalling and protein interactions

dc.contributor.author	Neumann, Don, author
dc.contributor.author	Ben-Hur, Asa, advisor
dc.contributor.author	Beveridge, Ross, committee member
dc.contributor.author	Blanchard, Nathaniel, committee member
dc.contributor.author	Reddy, Anireddy, committee member
dc.date.accessioned	2024-05-27T10:32:48Z
dc.date.available	2024-05-27T10:32:48Z
dc.date.issued	2024
dc.description.abstract	In the interdisciplinary field of bioinformatics, sequence data for biological problems comes in many different forms. This ranges from proteins, to RNA, to the ionic current for a strand of nucleotides from an Oxford Nanopore Technologies sequencing device. This data can be used to elucidate the fundamentals of biological processes on many levels, which can help humanity with everything from drug design to curing disease. All of our research focuses on biological problems encoded as sequences. The main focus of our research involves Oxford Nanopore Technology sequencing devices which are capable of directly sequencing long read RNA strands as is. We first concentrate on improving the basecalling accuracy for RNA, and have published a paper with a novel architecture achieving state-of-the-art performance. The basecalling architecture uses convolutional blocks, each with progressively larger kernel sizes which improves accuracy for the noisy nature of the data. We then describe ongoing research into the detection of post-transcriptional RNA modifications in nanopore sequencing data. Building on our basecalling research, we are able to discern modifications with read level resolution. Our work will facilitate research into the detection of N6-methyladeosine (m6A) while also furthering progress in the detection of other post-transcriptional modifications. Finally, we recount our recently accepted paper regarding protein-protein and host-pathogen interaction prediction. We performed experiments demonstrating faulty experimental design for interaction prediction which have plagued the field, giving the faulty impression the problem has been solved. We then provide reasoning and recommendations for future work.
dc.format.medium	born digital
dc.format.medium	doctoral dissertations
dc.identifier	Neumann_colostate_0053A_18230.pdf
dc.identifier.uri	https://hdl.handle.net/10217/238479
dc.identifier.uri	https://doi.org/10.25675/3.04083
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2020-
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	post transcriptional modifications
dc.subject	RNA basecalling
dc.subject	protein protein interactions
dc.subject	deep learning
dc.title	Deep learning for bioinformatics sequences: RNA basecalling and protein interactions
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Neumann_colostate_0053A_18230.pdf
Size:: 1.69 MB
Format:: Adobe Portable Document Format

Download

Collections

2020-
Theses and Dissertations