Prediction based scaling in a distributed stream processing cluster
dc.contributor.author | Khurana, Kartik, author | |
dc.contributor.author | Pallickara, Sangmi Lee, advisor | |
dc.contributor.author | Pallickara, Shrideep, committee member | |
dc.contributor.author | Carter, Ellison, committee member | |
dc.date.accessioned | 2020-06-22T11:52:33Z | |
dc.date.available | 2020-06-22T11:52:33Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Proliferation of IoT sensors and applications have enabled us to monitor and analyze scientific and social phenomena with continuously arriving voluminous data. To provide real-time processing capabilities over streaming data, distributed stream processing engines (DSPEs) such as Apache STORM and Apache FLINK have been widely deployed. These frameworks support computations over large-scale, high frequency streaming data. However, current on-demand auto-scaling features in these systems may result in an inefficient resource utilization which is closely related to cost effectiveness in popular cloud-based computing environments. We propose ARSTREAM, an auto-scaling computing environment that manages fluctuating throughputs for data from sensor networks, while ensuring efficient resource utilization. We have built an Artificial Neural Network model for predicting data processing queues and this model captures non-linear relationships between data arrival rates, resource utilization, and the size of data processing queue. If a bottleneck is predicted, ARSTREAM scales-out the current cluster automatically for current jobs without halting them at the user level. In addition, ARSTREAM incorporates threshold-based re-balancing to minimize data loss during extreme peak traffic that could not be predicted by our model. Our empirical benchmarks show that ARSTREAM forecasts data processing queue sizes with RMSE of 0.0429 when tested on real-time data. | |
dc.format.medium | born digital | |
dc.format.medium | masters theses | |
dc.identifier | KHURANA_colostate_0053N_15895.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/208425 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | auto-scaling | |
dc.subject | artificial neural networks (ANNs) | |
dc.subject | distributed stream processing engines (DSPEs) | |
dc.title | Prediction based scaling in a distributed stream processing cluster | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- KHURANA_colostate_0053N_15895.pdf
- Size:
- 4.02 MB
- Format:
- Adobe Portable Document Format