Shoukat Ali's PhD Thesis Abstract

Robust Resource Allocation in Dynamic Distributed Heterogeneous Computing Systems

Ph.D., Purdue University, Aug. 2003

Co-Major Professors: H. J. Siegel and Anthony A. Maciejewski

Performing computing and communication tasks on parallel and distributed sys-tems may involve the coordinated use of different types of machines, networks, in-terfaces, and other resources. All of these resources should be allocated in a way that maximizes some system performance measure. However, allocation decisions and performance prediction are often based on ”nominal” values of task and system parameters. The actual values of these parameters may differ from the nominal ones, e.g., because of inaccuracies in the initial estimation or because of changes over time caused by an unpredictable system environment.

An important question then arises: given a system design, what extent of depar-ture from the assumed circumstances will cause the performance to be unacceptably degraded? That is, how robust is the system? To address this issue, this research has designed a methodology for deriving the degree of robustness of a resource allocation - the maximum amount of collective uncertainty in task and system parameters within which a user-specified level of performance can be guaranteed. The procedure is illustrated by using it to derive robustness metrics for some example distributed systems. Furthermore, we will demonstrate the ability of the robustness metric to select the most robust resource allocation from among those that otherwise perform similarly (based on the primary performance criterion).

The main contributions of this research are (1) a mathematical description of a metric for the robustness of a resource allocation with respect to desired system performance features against multiple perturbations in multiple system and environmental conditions, (2) a procedure for deriving a robustness metric for an arbitrary system, (3) example applications of this procedure to several different systems, and (4) the design and analysis of heuristics for robust resource allocation in a class of heterogeneous computing systems.