Repository logo
 

Deep learning for bioinformatics sequences: RNA basecalling and protein interactions

dc.contributor.authorNeumann, Don, author
dc.contributor.authorBen-Hur, Asa, advisor
dc.contributor.authorBeveridge, Ross, committee member
dc.contributor.authorBlanchard, Nathaniel, committee member
dc.contributor.authorReddy, Anireddy, committee member
dc.date.accessioned2024-05-27T10:32:48Z
dc.date.available2024-05-27T10:32:48Z
dc.date.issued2024
dc.description.abstractIn the interdisciplinary field of bioinformatics, sequence data for biological problems comes in many different forms. This ranges from proteins, to RNA, to the ionic current for a strand of nucleotides from an Oxford Nanopore Technologies sequencing device. This data can be used to elucidate the fundamentals of biological processes on many levels, which can help humanity with everything from drug design to curing disease. All of our research focuses on biological problems encoded as sequences. The main focus of our research involves Oxford Nanopore Technology sequencing devices which are capable of directly sequencing long read RNA strands as is. We first concentrate on improving the basecalling accuracy for RNA, and have published a paper with a novel architecture achieving state-of-the-art performance. The basecalling architecture uses convolutional blocks, each with progressively larger kernel sizes which improves accuracy for the noisy nature of the data. We then describe ongoing research into the detection of post-transcriptional RNA modifications in nanopore sequencing data. Building on our basecalling research, we are able to discern modifications with read level resolution. Our work will facilitate research into the detection of N6-methyladeosine (m6A) while also furthering progress in the detection of other post-transcriptional modifications. Finally, we recount our recently accepted paper regarding protein-protein and host-pathogen interaction prediction. We performed experiments demonstrating faulty experimental design for interaction prediction which have plagued the field, giving the faulty impression the problem has been solved. We then provide reasoning and recommendations for future work.
dc.format.mediumborn digital
dc.format.mediumdoctoral dissertations
dc.identifierNeumann_colostate_0053A_18230.pdf
dc.identifier.urihttps://hdl.handle.net/10217/238479
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectpost transcriptional modifications
dc.subjectRNA basecalling
dc.subjectprotein protein interactions
dc.subjectdeep learning
dc.titleDeep learning for bioinformatics sequences: RNA basecalling and protein interactions
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (Ph.D.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Neumann_colostate_0053A_18230.pdf
Size:
1.69 MB
Format:
Adobe Portable Document Format