Repository logo
 

Theory of graph traversal edit distance, extensions, and applications

dc.contributor.authorEbrahimpour Boroojeny, Ali, author
dc.contributor.authorChitsaz, Hamidreza, advisor
dc.contributor.authorBen-Hur, Asa, committee member
dc.contributor.authorAbdo, Zaid, committee member
dc.date.accessioned2019-06-14T17:06:25Z
dc.date.available2019-06-14T17:06:25Z
dc.date.issued2019
dc.description.abstractMany problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this paper, we give a new graph kernel which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the graph traversal edit distance is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of an SVM classifier on a few datasets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested datasets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads. In this project, we also show how to solve the problems of local and semi-global alignment between two graphs. Finally, we suggest an approach for speeding up the computations using pre-assumption on a subset of nodes that have to be paired.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierEbrahimpourBoroojeny_colostate_0053N_15381.pdf
dc.identifier.urihttps://hdl.handle.net/10217/195340
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectgenome assembly
dc.subjectgraph edit distance
dc.subjectsequence co-assembly
dc.subjectgraph comparison
dc.subjectdifferential genome assembly
dc.subjectgraph kernel
dc.titleTheory of graph traversal edit distance, extensions, and applications
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EbrahimpourBoroojeny_colostate_0053N_15381.pdf
Size:
374.02 KB
Format:
Adobe Portable Document Format