Prediction based scaling in a distributed stream processing cluster

Khurana, Kartik, author; Pallickara, Sangmi Lee, advisor; Pallickara, Shrideep, committee member; Carter, Ellison, committee member

Prediction based scaling in a distributed stream processing cluster

Files

KHURANA_colostate_0053N_15895.pdf (4.02 MB)

Date

2020

Authors

Khurana, Kartik, author

Pallickara, Sangmi Lee, advisor

Pallickara, Shrideep, committee member

Carter, Ellison, committee member

Abstract

Proliferation of IoT sensors and applications have enabled us to monitor and analyze scientific and social phenomena with continuously arriving voluminous data. To provide real-time processing capabilities over streaming data, distributed stream processing engines (DSPEs) such as Apache STORM and Apache FLINK have been widely deployed. These frameworks support computations over large-scale, high frequency streaming data. However, current on-demand auto-scaling features in these systems may result in an inefficient resource utilization which is closely related to cost effectiveness in popular cloud-based computing environments. We propose ARSTREAM, an auto-scaling computing environment that manages fluctuating throughputs for data from sensor networks, while ensuring efficient resource utilization. We have built an Artificial Neural Network model for predicting data processing queues and this model captures non-linear relationships between data arrival rates, resource utilization, and the size of data processing queue. If a bottleneck is predicted, ARSTREAM scales-out the current cluster automatically for current jobs without halting them at the user level. In addition, ARSTREAM incorporates threshold-based re-balancing to minimize data loss during extreme peak traffic that could not be predicted by our model. Our empirical benchmarks show that ARSTREAM forecasts data processing queue sizes with RMSE of 0.0429 when tested on real-time data.

Subject

auto-scaling

artificial neural networks (ANNs)

distributed stream processing engines (DSPEs)

URI

https://hdl.handle.net/10217/208425

Collections

2020-
Theses and Dissertations

Full item page

Prediction based scaling in a distributed stream processing cluster

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections