Repository logo
 

On the use of locality aware distributed hash tables for homology searches over voluminous biological sequence data

dc.contributor.authorTolooee, Cameron, author
dc.contributor.authorPallickara, Sangmi, advisor
dc.contributor.authorBen-Hur, Asa, committee member
dc.contributor.authorvon Fischer, Joseph, committee member
dc.date.accessioned2016-01-11T15:14:01Z
dc.date.available2016-01-11T15:14:01Z
dc.date.issued2015
dc.description.abstractRapid advances in genomic sequencing technology have resulted in a data deluge in biology and bioinformatics. This increase in data volumes has introduced computational challenges for frequently performed sequence analytics routines such as DNA and protein homology searches; these must also preferably be done in real-time. This thesis proposes a scalable and similarity-aware distributed storage framework, Mendel, that enables retrieval of biologically significant DNA and protein alignments against a voluminous genomic sequence database. Mendel fragments the sequence data and generates an inverted-index, which is then dispersed over a distributed collection of machines using a locality aware distributed hash table. A novel distributed nearest neighbor search algorithm identifies sequence segments with high similarity and splices them together to form an alignment. This paper includes an empirical evaluation of the performance, sensitivity, and scalability of the proposed system over the NCBI's non-redundant protein dataset. In these benchmarks, Mendel demonstrates higher sensitivity and faster query evaluations when compared to other modern frameworks.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierTolooee_colostate_0053N_13379.pdf
dc.identifier.urihttp://hdl.handle.net/10217/170402
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectdistributed system
dc.subjecthomology search
dc.subjectsequence similarity search
dc.titleOn the use of locality aware distributed hash tables for homology searches over voluminous biological sequence data
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tolooee_colostate_0053N_13379.pdf
Size:
837.92 KB
Format:
Adobe Portable Document Format