Paralogy or reality? Exploring gene assembly errors in a target enrichment dataset
dc.contributor.author | Rosén, Austin, author | |
dc.contributor.author | Simmons, Mark P., advisor | |
dc.contributor.author | Ackerfield, Jennifer, committee member | |
dc.contributor.author | Richards, Christopher, committee member | |
dc.contributor.author | Stewart, Jane, committee member | |
dc.date.accessioned | 2022-08-29T10:16:13Z | |
dc.date.available | 2022-08-29T10:16:13Z | |
dc.date.issued | 2022 | |
dc.description.abstract | De novo gene assembly of short read data is inherently difficult – similar to the process of assembling a jigsaw puzzle. I describe three errors that occurred with the assembly of target enrichment data in the genus Cirsium (Asteraceae): inconsistent contig selection, artificial recombination, and inconsistent intron determination leading to over-alignment of non-homologous nucleotides. These errors occurred in 39% of loci in the dataset and were often a by-product of undetected paralogs: assembled loci that likely contained paralogous or homoeologous sequences but did not trigger default paralog warnings by the assembly program, HybPiper. Default HybPiper thresholds for identifying paralogy during the assembly process were insufficient to filter such loci. A custom target file was created in which putative paralogs were separated into independent loci. The custom target file was successful in reducing, but not eliminating, assembly errors in the dataset. A final iteration of quality control was performed to create a dataset largely free of assembly errors. However, phylogenetic inferences applied to this final cleansed dataset were unable to resolve the taxonomic relationships between the sampled specimens. Rather, these results affirm that Cirsium is a taxonomically problematic genus and may require population-level genetic data or integrative taxonomy approaches to delimit species boundaries. | |
dc.format.medium | born digital | |
dc.format.medium | masters theses | |
dc.identifier | Rosen_colostate_0053N_17371.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/235627 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.title | Paralogy or reality? Exploring gene assembly errors in a target enrichment dataset | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Biology | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Rosen_colostate_0053N_17371.pdf
- Size:
- 3.17 MB
- Format:
- Adobe Portable Document Format