RUBIKS: rapid explorations and summarization over high dimensional spatiotemporal datasets
Date
2024-04-03
Journal Title
Journal ISSN
Volume Title
Abstract
Exponential growth in spatial data volumes have occurred alongside increases in the dimensionality of datasets and the rates at which observations are generated. Rapid summarization and explorations of such datasets are a precursor to several downstream operations including data wrangling, preprocessing, hypothesis formulation, and model construction among others. However, researchers are stymied both by the dimensionality and data volumes that often entail extensive data movements, computation overheads, and I/O. Here, we describe our methodology to support effective summarizations and explorations at scale over arbitrary spatiotemporal scopes, which encapsulate the spatial extents, temporal bounds, or combinations thereof over the data space of interest. Summarizations can be performed over all variables representing the dataspace or subsets specified by the user. We extend the concept of data cubes to encompass spatiotemporal datasets with high-dimensionality and where there might be significant gaps in the data because measurements (or observations) of diverse variables are not synchronized and may occur at diverse rates. We couple our data summarization features with a rapid Choropleth visualizer that allows users to explore spatial variations of diverse measures of interest. We validate these concepts in the context of an Environmental Protection Agency dataset which tracks over 4000 chemical pollutants, presenting in natural water sources across the United States from 1970 onwards.
Description
Rights Access
Subject
spatial data
data cubes
summarizations