Efficient exploration of diverse datasets through harmonization of encodings and representations

O'Leary, Tyson M., author; Pallickara, Shrideep, advisor; Pallickara, Sangmi Lee, advisor; Vijayasarathy, Leo, committee member

Efficient exploration of diverse datasets through harmonization of encodings and representations

Files

Access status: Embargo until 2027-01-07 , OLeary_colostate_0053N_19391.pdf (318.11 KB)

Date

2025

Authors

O'Leary, Tyson M., author

Pallickara, Shrideep, advisor

Pallickara, Sangmi Lee, advisor

Vijayasarathy, Leo, committee member

Abstract

Voluminous spatiotemporal data generation has occurred alongside a tremendous diversity in the data. Data reconciliation and harmonization, including units of measurement, are a precursor to efficient downstream analysis. However, the diversity of encoding formats, spatial and coordinate referencing systems, types of data (points, shapes, rasters/grids), their volumes, and variety of data storage strategies can stymie data analyses. The complexity is further exacerbated by the fact that datasets are often "layered" i.e., we consider more than one dataset during analyses. Without systematic harmonization, valuable insights can remain inaccessible, locked away by technical incompatibilities. In this study, we describe our methodology to not just harmonize such datasets but include support for layering them into federated datasets alongside an ecosystem of services including dimensionality reduction, query evaluations, correlation analysis, normalization, and visualization. Together these capabilities allow researchers to move from raw, fragmented data toward integrated, interpretable results with significantly reduced friction during analyses. These services are amenable to daisy chaining, operate on distributed datasets, integrate with established distributed approaches, and scale. Our benchmarks contrast performance with systems such as Spark and Sedona.

Rights Access

Embargo expires: 01/07/2027.

Subject

data wrangling

harmonization

visual analytics

federation

big data

metadata

URI

https://hdl.handle.net/10217/242719
https://doi.org/10.25675/3.025611

Collections

2020-
Theses and Dissertations

Full item page

Efficient exploration of diverse datasets through harmonization of encodings and representations

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By