Repository logo
 

Read alignment using deep neural networks

dc.contributor.authorShrestha, Akash, author
dc.contributor.authorChitsaz, Hamidreza, advisor
dc.contributor.authorBen-Hur, Asa, committee member
dc.contributor.authorAbdo, Zaid, committee member
dc.date.accessioned2019-06-14T17:06:25Z
dc.date.available2019-06-14T17:06:25Z
dc.date.issued2019
dc.description.abstractRead alignment is the process of mapping short DNA sequences into the reference genome. With the advent of consecutively evolving "next generation" sequencing technologies, the need for sequence alignment tools appeared. Many scientific communities and the companies marketing the sequencing technologies developed a whole spectrum of read aligners/mappers for different error profiles and read length characteristics. Among the most recent successfully marketed sequencing technologies are Oxford Nanopore and PacBio SMRT sequencing, which are considered top players because of their extremely long reads and low cost. However, the reads may contain error up to 20% that are not generally uniformly distributed. To deal with that level of error rate and read length, proximity preserving hashing techniques, such as Minhash and Minimizers, were utilized to quickly map a read to the target region of the reference sequence. Subsequently, a variant of global or local alignment dynamic programming is then used to give the final alignment. In this research work, we train a Deep Neural Network (DNN) to yield a hashing scheme for the highly erroneous long reads, which is deemed superior to Minhash for mapping the reads. We implemented that idea to build a read alignment tool: DNNAligner. We evaluated the performance of our aligner against the popular read aligners in the bioinformatics community currently — minimap2, bwa-mem and graphmap. Our results show that the performance of DNNAligner is comparable to other tools without any code optimization or integration of other advanced features. Moreover, DNN exhibits superior performance in comparison with Minhashon neighborhood classification.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierShrestha_colostate_0053N_15383.pdf
dc.identifier.urihttps://hdl.handle.net/10217/195341
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectMinhash
dc.subjectpattern discovery
dc.subjectsequence alignment
dc.subjectneural network
dc.subject.lcshDNA
dc.titleRead alignment using deep neural networks
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shrestha_colostate_0053N_15383.pdf
Size:
707.55 KB
Format:
Adobe Portable Document Format