Repository logo
 

RUBIKS: rapid explorations and summarization over high dimensional spatiotemporal datasets

dc.contributor.authorMitra, Saptashwa, author
dc.contributor.authorYoung, Matt, author
dc.contributor.authorBreidt, Jay, author
dc.contributor.authorPallickara, Sangmi, author
dc.contributor.authorPallickara, Shrideep, author
dc.contributor.authorACM, publisher
dc.date.accessioned2024-11-11T19:34:35Z
dc.date.available2024-11-11T19:34:35Z
dc.date.issued2024-04-03
dc.description.abstractExponential growth in spatial data volumes have occurred alongside increases in the dimensionality of datasets and the rates at which observations are generated. Rapid summarization and explorations of such datasets are a precursor to several downstream operations including data wrangling, preprocessing, hypothesis formulation, and model construction among others. However, researchers are stymied both by the dimensionality and data volumes that often entail extensive data movements, computation overheads, and I/O. Here, we describe our methodology to support effective summarizations and explorations at scale over arbitrary spatiotemporal scopes, which encapsulate the spatial extents, temporal bounds, or combinations thereof over the data space of interest. Summarizations can be performed over all variables representing the dataspace or subsets specified by the user. We extend the concept of data cubes to encompass spatiotemporal datasets with high-dimensionality and where there might be significant gaps in the data because measurements (or observations) of diverse variables are not synchronized and may occur at diverse rates. We couple our data summarization features with a rapid Choropleth visualizer that allows users to explore spatial variations of diverse measures of interest. We validate these concepts in the context of an Environmental Protection Agency dataset which tracks over 4000 chemical pollutants, presenting in natural water sources across the United States from 1970 onwards.
dc.format.mediumborn digital
dc.format.mediumarticles
dc.identifier.bibliographicCitationSaptashwa Mitra, Matt Young, Jay Breidt, Sangmi Pallickara, Shrideep Pallickara. 2023. Rubiks: Rapid Explorations and Summarization over High Dimensional Spatiotemporal Datasets. In BDCAT '23: Proceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies Article No.: 11, Pages 1-11. https://doi.org/10.1145/3632366.3632393
dc.identifier.doihttps://doi.org/10.1145/3632366.3632393
dc.identifier.urihttps://hdl.handle.net/10217/239543
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartofPublications
dc.relation.ispartofACM DL Digital Library
dc.rights© Saptashwa Mitra, et al. | ACM 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in BDCAT '23, https://dx.doi.org/10.1145/3632366.3632393.
dc.subjectspatial data
dc.subjectdata cubes
dc.subjectsummarizations
dc.titleRUBIKS: rapid explorations and summarization over high dimensional spatiotemporal datasets
dc.typeText

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FACF_ACMOA_3632366.3632393.pdf
Size:
4.06 MB
Format:
Adobe Portable Document Format

Collections