Repository logo
 

Near real-time processing of voluminous, high-velocity data streams for continuous sensing environments

Date

2020

Authors

Hewa Raga Munige, Thilina, author
Pallickara, Shrideep, advisor
Chandrasekar, V., committee member
Ghosh, Sudipto, committee member
Pallickara, Sangmi, committee member

Journal Title

Journal ISSN

Volume Title

Abstract

Recent advancements in miniaturization, falling costs, networking enhancements, and battery technologies have contributed to a proliferation of networked sensing devices. Arrays of coordinated sensing devices are deployed in continuous sensing environments (CSEs) where the phenomena of interest are monitored. Observations sensed by devices in a CSE setting are encapsulated as multidimensional data streams that must subsequently be processed. The vast number of sensing devices, the high rates at which data are generated, and the high-resolutions at which these measurements are performed contribute to the voluminous, high-velocity data streams that are now increasingly pervasive. These data streams must be processed in near real-time to power user-facing applications such as visualization dashboards and monitoring systems, as well as various stages of data ingestion pipelines such as ETL pipelines. This dissertation focuses on facilitating efficient ingestion and near real-time processing of voluminous, high-velocity data streams originating in CSEs. Challenges in ingesting and processing such streams include energy and bandwidth constraints at the data sources, data transfer and processing costs, underutilized resources, and preserving the performance of stream processing applications in the presence of variable workloads and system conditions. Toward this end, we explore design principles to build a high-performant and adaptive stream processing engine to address processing challenges that are unique to CSE data streams. Further, we demonstrate how our holistic methodology based on space-efficient representations of data streams through a controlled trade-off of accuracy, can substantially alleviate stream ingestion challenges while improving the stream processing performance. We evaluate the efficacy of our methodology using real-world streaming datasets in a large-scale setup and contrast against the state-of-the-art developments in the field.

Description

Rights Access

Subject

distributed computing
edge computing
data sketching
Internet of Things
distributed stream processing

Citation

Associated Publications