Repository logo

Efficient exploration of diverse datasets through harmonization of encodings and representations

dc.contributor.authorO'Leary, Tyson M., author
dc.contributor.authorPallickara, Shrideep, advisor
dc.contributor.authorLee Pallickara, Sangmi, advisor
dc.contributor.authorVijayasarathy, Leo, committee member
dc.date.accessioned2026-01-12T11:27:52Z
dc.date.issued2025
dc.description.abstractVoluminous spatiotemporal data generation has occurred alongside a tremendous diversity in the data. Data reconciliation and harmonization, including units of measurement, are a precursor to efficient downstream analysis. However, the diversity of encoding formats, spatial and coordinate referencing systems, types of data (points, shapes, rasters/grids), their volumes, and variety of data storage strategies can stymie data analyses. The complexity is further exacerbated by the fact that datasets are often "layered" i.e., we consider more than one dataset during analyses. Without systematic harmonization, valuable insights can remain inaccessible, locked away by technical incompatibilities. In this study, we describe our methodology to not just harmonize such datasets but include support for layering them into federated datasets alongside an ecosystem of services including dimensionality reduction, query evaluations, correlation analysis, normalization, and visualization. Together these capabilities allow researchers to move from raw, fragmented data toward integrated, interpretable results with significantly reduced friction during analyses. These services are amenable to daisy chaining, operate on distributed datasets, integrate with established distributed approaches, and scale. Our benchmarks contrast performance with systems such as Spark and Sedona.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierOLeary_colostate_0053N_19391.pdf
dc.identifier.urihttps://hdl.handle.net/10217/242719
dc.identifier.urihttps://doi.org/10.25675/3.025611
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.rights.accessEmbargo expires: 01/07/2027.
dc.subjectdata wrangling
dc.subjectharmonization
dc.subjectvisual analytics
dc.subjectfederation
dc.subjectbig data
dc.subjectmetadata
dc.titleEfficient exploration of diverse datasets through harmonization of encodings and representations
dc.typeText
dc.typeImage
dcterms.embargo.expires2027-01-07
dcterms.embargo.terms2027-01-07
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
OLeary_colostate_0053N_19391.pdf
Size:
318.11 KB
Format:
Adobe Portable Document Format
Access status: Embargo until 2027-01-07 , Download