Efficient exploration of diverse datasets through harmonization of encodings and representations
| dc.contributor.author | O'Leary, Tyson M., author | |
| dc.contributor.author | Pallickara, Shrideep, advisor | |
| dc.contributor.author | Lee Pallickara, Sangmi, advisor | |
| dc.contributor.author | Vijayasarathy, Leo, committee member | |
| dc.date.accessioned | 2026-01-12T11:27:52Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Voluminous spatiotemporal data generation has occurred alongside a tremendous diversity in the data. Data reconciliation and harmonization, including units of measurement, are a precursor to efficient downstream analysis. However, the diversity of encoding formats, spatial and coordinate referencing systems, types of data (points, shapes, rasters/grids), their volumes, and variety of data storage strategies can stymie data analyses. The complexity is further exacerbated by the fact that datasets are often "layered" i.e., we consider more than one dataset during analyses. Without systematic harmonization, valuable insights can remain inaccessible, locked away by technical incompatibilities. In this study, we describe our methodology to not just harmonize such datasets but include support for layering them into federated datasets alongside an ecosystem of services including dimensionality reduction, query evaluations, correlation analysis, normalization, and visualization. Together these capabilities allow researchers to move from raw, fragmented data toward integrated, interpretable results with significantly reduced friction during analyses. These services are amenable to daisy chaining, operate on distributed datasets, integrate with established distributed approaches, and scale. Our benchmarks contrast performance with systems such as Spark and Sedona. | |
| dc.format.medium | born digital | |
| dc.format.medium | masters theses | |
| dc.identifier | OLeary_colostate_0053N_19391.pdf | |
| dc.identifier.uri | https://hdl.handle.net/10217/242719 | |
| dc.identifier.uri | https://doi.org/10.25675/3.025611 | |
| dc.language | English | |
| dc.language.iso | eng | |
| dc.publisher | Colorado State University. Libraries | |
| dc.relation.ispartof | 2020- | |
| dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
| dc.rights.access | Embargo expires: 01/07/2027. | |
| dc.subject | data wrangling | |
| dc.subject | harmonization | |
| dc.subject | visual analytics | |
| dc.subject | federation | |
| dc.subject | big data | |
| dc.subject | metadata | |
| dc.title | Efficient exploration of diverse datasets through harmonization of encodings and representations | |
| dc.type | Text | |
| dc.type | Image | |
| dcterms.embargo.expires | 2027-01-07 | |
| dcterms.embargo.terms | 2027-01-07 | |
| dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
| thesis.degree.discipline | Computer Science | |
| thesis.degree.grantor | Colorado State University | |
| thesis.degree.level | Masters | |
| thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- OLeary_colostate_0053N_19391.pdf
- Size:
- 318.11 KB
- Format:
- Adobe Portable Document Format
