Enzyme selection for optical mapping is hard

Adams, Laura, authorBoucher, Christina, advisorHowe, Adele, committee memberIngram, Patrick, committee memberEnzyme selection for optical mapping is hardColorado State University. Libraries2015genome assemblyoptical mappingmisassembly errorenzyme selectionMy UniversityMy University2015-08-282015-08-282015engTexthttp://hdl.handle.net/10217/167113https://doi.org/10.25675/3.024191born digitalmasters thesesCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.The process of assembling a genome, without access to a reference genome, is prone to a type of error called a misassembly error. These errors are difficult to detect and can mimic true, biological variation. Optical mapping data has been shown to have the potential to reduce misassembly errors in draft genomes. Optical mapping data is generated using digestion enzymes on a genome. In this paper, we formulate the problem of selecting optimal digestion enzymes to create the most informative optical map. We show this process in NP-hard and W[1]-hard. We also propose and evaluate a machine learning method using a support vector machine and feature reduction to estimate the optimal enzymes. Using this method, we were able to predict two optimal enzymes exactly and estimate three more within reasonable similarity.