Repository logo
 

Prediction based scaling in a distributed stream processing cluster

dc.contributor.authorKhurana, Kartik, author
dc.contributor.authorPallickara, Sangmi Lee, advisor
dc.contributor.authorPallickara, Shrideep, committee member
dc.contributor.authorCarter, Ellison, committee member
dc.date.accessioned2020-06-22T11:52:33Z
dc.date.available2020-06-22T11:52:33Z
dc.date.issued2020
dc.description.abstractProliferation of IoT sensors and applications have enabled us to monitor and analyze scientific and social phenomena with continuously arriving voluminous data. To provide real-time processing capabilities over streaming data, distributed stream processing engines (DSPEs) such as Apache STORM and Apache FLINK have been widely deployed. These frameworks support computations over large-scale, high frequency streaming data. However, current on-demand auto-scaling features in these systems may result in an inefficient resource utilization which is closely related to cost effectiveness in popular cloud-based computing environments. We propose ARSTREAM, an auto-scaling computing environment that manages fluctuating throughputs for data from sensor networks, while ensuring efficient resource utilization. We have built an Artificial Neural Network model for predicting data processing queues and this model captures non-linear relationships between data arrival rates, resource utilization, and the size of data processing queue. If a bottleneck is predicted, ARSTREAM scales-out the current cluster automatically for current jobs without halting them at the user level. In addition, ARSTREAM incorporates threshold-based re-balancing to minimize data loss during extreme peak traffic that could not be predicted by our model. Our empirical benchmarks show that ARSTREAM forecasts data processing queue sizes with RMSE of 0.0429 when tested on real-time data.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierKHURANA_colostate_0053N_15895.pdf
dc.identifier.urihttps://hdl.handle.net/10217/208425
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectauto-scaling
dc.subjectartificial neural networks (ANNs)
dc.subjectdistributed stream processing engines (DSPEs)
dc.titlePrediction based scaling in a distributed stream processing cluster
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KHURANA_colostate_0053N_15895.pdf
Size:
4.02 MB
Format:
Adobe Portable Document Format