Publications

Permanent URI for this collectionhttps://hdl.handle.net/10217/239510

Browse

Now showing 1 - 2 of 2

Open Access
A framework for profiling spatial variability in the performance of classification models
(Colorado State University. Libraries, 2024-04-03) Warushavithana, Menuka, author; Barram, Kassidy, author; Carlson, Caleb, author; Mitra, Saptashwa, author; Ghosh, Sudipto, author; Breidt, Jay, author; Pallickara, Sangmi Lee, author; Pallickara, Shrideep, author; ACM, publisher
Scientists use models to further their understanding of phenomena and inform decision-making. A confluence of factors has contributed to an exponential increase in spatial data volumes. In this study, we describe our methodology to identify spatial variation in the performance of classification models. Our methodology allows tracking a host of performance measures across different thresholds for the larger, encapsulating spatial area under consideration. Our methodology ensures frugal utilization of resources via a novel validation budgeting scheme that preferentially allocates observations for validations. We complement these efforts with a browser-based, GPU-accelerated visualization scheme that also incorporates support for streaming to assimilate validation results as they become available.
Open Access
RUBIKS: rapid explorations and summarization over high dimensional spatiotemporal datasets
(Colorado State University. Libraries, 2024-04-03) Mitra, Saptashwa, author; Young, Matt, author; Breidt, Jay, author; Pallickara, Sangmi, author; Pallickara, Shrideep, author; ACM, publisher
Exponential growth in spatial data volumes have occurred alongside increases in the dimensionality of datasets and the rates at which observations are generated. Rapid summarization and explorations of such datasets are a precursor to several downstream operations including data wrangling, preprocessing, hypothesis formulation, and model construction among others. However, researchers are stymied both by the dimensionality and data volumes that often entail extensive data movements, computation overheads, and I/O. Here, we describe our methodology to support effective summarizations and explorations at scale over arbitrary spatiotemporal scopes, which encapsulate the spatial extents, temporal bounds, or combinations thereof over the data space of interest. Summarizations can be performed over all variables representing the dataspace or subsets specified by the user. We extend the concept of data cubes to encompass spatiotemporal datasets with high-dimensionality and where there might be significant gaps in the data because measurements (or observations) of diverse variables are not synchronized and may occur at diverse rates. We couple our data summarization features with a rapid Choropleth visualizer that allows users to explore spatial variations of diverse measures of interest. We validate these concepts in the context of an Environmental Protection Agency dataset which tracks over 4000 chemical pollutants, presenting in natural water sources across the United States from 1970 onwards.

Browse

Browsing Publications by Author "Breidt, Jay, author"