The robustness of resource allocations in parallel and distributed computing systems
Date
2006
Authors
Shestak, Vladimir, author
Maciejewski, Anthony A., author
Siegel, Howard Jay, author
Ali, Shoukat, author
Springer-Verlag Berlin Heidelberg, publisher
Journal Title
Journal ISSN
Volume Title
Abstract
This corresponds to the material in the invited keynote presentation by H. J. Siegel, summarizing the research in [2, 23]. Resource allocation decisions in heterogeneous parallel and distributed computer systems and associated performance prediction are often based on estimated values of application and system parameters, whose actual values are uncertain and may be differ from the estimates. We have designed a model for deriving the degree of robustness of a resource allocation--the maximum amount of collective uncertainty in parameters within which a user-specified level of system performance can be guaranteed. The model will be presented, and we will demonstrate its ability to select the most robust resource allocation from among those that otherwise perform similarly (based on the primary performance criterion). We will show how the model can be used in off-line allocation heuristics to maximize the robustness of makespan against inaccuracies in estimates of application execution times in a cluster.