Efficient exploration of diverse datasets through harmonization of encodings and representations
Loading...
Files
OLeary_colostate_0053N_19391.pdf (318.11 KB)Access status: Embargo until 2027-01-07 ,
Date
Journal Title
Journal ISSN
Volume Title
Abstract
Voluminous spatiotemporal data generation has occurred alongside a tremendous diversity in the data. Data reconciliation and harmonization, including units of measurement, are a precursor to efficient downstream analysis. However, the diversity of encoding formats, spatial and coordinate referencing systems, types of data (points, shapes, rasters/grids), their volumes, and variety of data storage strategies can stymie data analyses. The complexity is further exacerbated by the fact that datasets are often "layered" i.e., we consider more than one dataset during analyses. Without systematic harmonization, valuable insights can remain inaccessible, locked away by technical incompatibilities. In this study, we describe our methodology to not just harmonize such datasets but include support for layering them into federated datasets alongside an ecosystem of services including dimensionality reduction, query evaluations, correlation analysis, normalization, and visualization. Together these capabilities allow researchers to move from raw, fragmented data toward integrated, interpretable results with significantly reduced friction during analyses. These services are amenable to daisy chaining, operate on distributed datasets, integrate with established distributed approaches, and scale. Our benchmarks contrast performance with systems such as Spark and Sedona.
Description
Rights Access
Embargo expires: 01/07/2027.
Subject
data wrangling
harmonization
visual analytics
federation
big data
metadata
