Browsing by Author "Mitra, Saptashwa, author"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets(Colorado State University. Libraries, 2018) Mitra, Saptashwa, author; Pallickara, Sangmi Lee, advisor; Pallickara, Shrideep, committee member; Li, Kaigang, committee memberCombining data from disparate sources enhances the opportunity to explore different aspects of the phenomena under consideration. However, there are several challenges in doing so effectively that include inter alia, the heterogeneity in data representation and format, collection patterns, and integration of foreign data attributes in a ready-to-use condition. In this study, we propose a scalable query-oriented data integration framework that provides estimations for spatiotemporally aligned data points. We have designed Confluence, a distributed data integration framework that dynamically generates accurate interpolations for the targeted spatiotemporal scopes along with an estimate of the uncertainty involved with such estimation. Confluence orchestrates computations to evaluate spatial and temporal query joins and to interpolate values. Our methodology facilitates distributed query evaluations with a dynamic relaxation of query constraints. Query evaluations are locality-aware and we leverage model-based dynamic parameter selection to provide accurate estimation for data points. We have included empirical benchmarks that profile the suitability of our approach in terms of accuracy, latency, and throughput at scale.Item Open Access Towards interactive analytics over voluminous spatiotemporal data using a distributed, in-memory framework(Colorado State University. Libraries, 2023) Mitra, Saptashwa, author; Pallickara, Sangmi Lee advisor; Pallickara, Shrideep, committee member; Ortega, Francisco, committee member; Li, Kaigang, committee memberThe proliferation of heterogeneous data sources, driven by advancements in sensor networks, simulations, and observational devices, has reached unprecedented levels. This surge in data generation and the demand for proper storage has been met with extensive research and development in distributed storage systems, facilitating the scalable housing of these voluminous datasets while enabling analytical processes. Nonetheless, the extraction of meaningful insights from these datasets, especially in the context of low-latency/ interactive analytics, poses a formidable challenge. This arises from the persistent gap between the processing capacity of distributed systems and their ever-expanding storage capabilities. Moreover, the interactive querying of these datasets is hindered by disk I/O, redundant network communications, recurrent hotspots, transient surges of user interest over limited geospatial regions, particularly in systems that concurrently serve multiple users. In environments where interactive querying is paramount, such as visualization systems, addressing these challenges becomes imperative. This dissertation delves into the intricacies of enabling interactive analytics over large-scale spatiotemporal datasets. My research efforts are centered around the conceptualization and implementation of a scalable storage, indexing, and caching framework tailored specifically for spatiotemporal data access. The research aims to create frameworks to facilitate fast query analytics over diverse data-types ranging from point, vector, and raster datasets. The frameworks implemented are characterized by its lightweight nature, residence primarily in memory, and their capacity to support model-driven extraction of insights from raw data or dynamic reconstruction of compressed/ partial in-memory data fragments with an acceptable level of accuracy. This approach effectively helps reduce the memory footprint of cached data objects and also mitigates the need for frequent client-server communications. Furthermore, we investigate the potential of leveraging various transfer learning techniques to improve the turn-around times of our memory-resident deep learning models, given the voluminous nature of our datasets, while maintaining good overall accuracy over its entire spatiotemporal domain. Additionally, our research explores the extraction of insights from high-dimensional datasets, such as satellite imagery, within this framework. The dissertation is also accompanied by empirical evaluations of our frameworks as well as the future directions and anticipated contributions in the domain of interactive analytics over large-scale spatiotemporal datasets, acknowledging the evolving landscape of data analytics where analytics frameworks increasingly rely on compute-intensive machine learning models.