Mitra, Saptashwa, authorYoung, Matt, authorBreidt, Jay, authorPallickara, Sangmi, authorPallickara, Shrideep, authorACM, publisher2024-11-112024-11-112024-04-03Saptashwa Mitra, Matt Young, Jay Breidt, Sangmi Pallickara, Shrideep Pallickara. 2023. Rubiks: Rapid Explorations and Summarization over High Dimensional Spatiotemporal Datasets. In BDCAT '23: Proceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies Article No.: 11, Pages 1-11. https://doi.org/10.1145/3632366.3632393https://hdl.handle.net/10217/239543Exponential growth in spatial data volumes have occurred alongside increases in the dimensionality of datasets and the rates at which observations are generated. Rapid summarization and explorations of such datasets are a precursor to several downstream operations including data wrangling, preprocessing, hypothesis formulation, and model construction among others. However, researchers are stymied both by the dimensionality and data volumes that often entail extensive data movements, computation overheads, and I/O. Here, we describe our methodology to support effective summarizations and explorations at scale over arbitrary spatiotemporal scopes, which encapsulate the spatial extents, temporal bounds, or combinations thereof over the data space of interest. Summarizations can be performed over all variables representing the dataspace or subsets specified by the user. We extend the concept of data cubes to encompass spatiotemporal datasets with high-dimensionality and where there might be significant gaps in the data because measurements (or observations) of diverse variables are not synchronized and may occur at diverse rates. We couple our data summarization features with a rapid Choropleth visualizer that allows users to explore spatial variations of diverse measures of interest. We validate these concepts in the context of an Environmental Protection Agency dataset which tracks over 4000 chemical pollutants, presenting in natural water sources across the United States from 1970 onwards.born digitalarticleseng©Saptashwa Mitra, et al. ACM 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in BDCAT '23, https://dx.doi.org/10.1145/3632366.3632393.spatial datadata cubessummarizationsRUBIKS: rapid explorations and summarization over high dimensional spatiotemporal datasetsTexthttps://doi.org/10.1145/3632366.3632393