Measuring robustness for distributed computing systems
Date
2003
Authors
Maciejewski, Anthony A., author
Ali, Shoukat, author
Siegel, Howard Jay, author
[University of Illinois at Chicago], publisher
Journal Title
Journal ISSN
Volume Title
Abstract
Performing computing and communication tasks on parallel and distributed systems may involve the coordinated use of different types of machines, networks, interfaces, and other resources. All of these resources should be allocated in a way that maximizes some system performance measure. However, allocation decisions and performance prediction are often based on "nominal" values of application and system parameters. The actual values of these parameters may differ from the nominal ones, e.g., because of inaccuracies in the initial estimation or because of changes over time caused by an unpredictable system environment. An important question then arises: given a system design, what extent of departure from the assumed circumstances will cause the performance to be unacceptably degraded? That is, how robust is the system? To address this issue, one needs to derive a design methodology for deriving the degree of robustness of a resource allocation - the maximum amount of collective uncertainty in application and system parameters within which a user specified level of performance can be guaranteed. Our procedure for this is presented in this paper. The main contributions of this research are (1) a mathematical description of a metric for the robustness of a resource allocation with respect to desired system performance features against multiple perturbations in multiple system and environmental conditions, (2) a procedure for deriving a robustness metric for an arbitrary system, and (3) example applications of this procedure to several different systems.