Distributed algorithms for the orchestration of stochastic discrete event simulations
Date
2014
Authors
Sui, Zhiquan, author
Pallickara, Shrideep, advisor
Anderson, Charles, committee member
Böhm, Wim, committee member
Hayne, Stephen, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Discrete event simulations are widely used in modeling real-world phenomena such as epidemiology, congestion analysis, weather forecasting, economic activity, and chemical reactions. The expressiveness of such simulations depends on the number and types of entities that are modeled and also the interactions that entities have with each other. In the case of stochastic simulations, these interactions are based on the concomitant probability density functions. The more exhaustively a phenomena is modeled, the greater its computational complexity and, correspondingly, the execution time. Distributed orchestration can speed-up such complex simulations. This dissertation considers the problem of distributed orchestration of stochastic discrete event simulations where the computations are irregular and the processing loads stochastic. We have designed a suite of algorithms that target alleviating imbalances between processing elements across synchronization time steps. The algorithms explore different aspects of the orchestration spectrum: static vs. dynamic, reactive vs. proactive, and deterministic vs. learning-based. The feature vector that guides our algorithms include externally observable features of the simulation such as computational footprints and hardware profiles, and features internal to the simulation such as entity states. The learning structure includes basic version of Artificial Neural Network (ANN) and an improved version of ANN. The algorithms are self-tuning and account for the state of the simulation and processing elements while coping with prediction errors. Finally, these algorithms address resource uncertainty as well. Resource uncertainty in such settings occurs due to resource failures, slowdowns, and heterogeneity. Task apportioning, speculative tasks to cope with stragglers, and checkpointing account for the quality and state of both the resource and simulation. The algorithms achieve demonstrably good performance. Despite the irregular nature of these computations, stochasticity in the processing loads, and resource uncertainty execution times are reduced by a factor of 1.8 when the number of resources is doubled.