Graduate Exam Abstract

Mark Oxley

Ph.D. Preliminary

March 7, 2014, 1:00 PM - 3:00 PM

ECE Conference Room

Robust Energy-Aware Resource Management Heuristics and Optimization Techniques For Power and Thermal Constrained Heterogeneous Computing Systems

Abstract: Today's data centers face the issue of balancing electricity use and completion times of their workloads. Rising electricity costs are forcing data center operators to either operate within an electricity budget or to reduce electricity use as much as possible while still maintaining service agreements. Energy-aware resource allocation is one technique a system administrator can employ to address both problems: optimizing the workload completion time (makespan) when given an energy budget, or to minimize energy consumption subject to service guarantees (such as adhering to deadlines). In this thesis, the problem of energy-aware static resource allocation in an environment where a collection of independent (non-communicating) tasks ("bag-of- tasks") is assigned to a heterogeneous computing system is studied. Computing systems often operate in environments where task execution times vary (e.g., due to cache misses or data dependent execution times). These execution times are modeled stochastically, using probability density functions. It is desirable for resource allocations to be robust against these variations, where energy-robustness is defined as the probability that the energy budget is not violated, and makespan-robustness is defined as the probability a makespan deadline is not violated. For both energy-constrained and deadline-constrained problems, novel heuristics are designed and analyzed. The rapid increase of the power consumption of data centers has led to an increase in the amount cooling resources required to operate these data centers at a safe threshold. The cooling systems account for a large portion of the total power consumption used by a data center, causing the costs of providing power to these data centers to rise. In this thesis, novel resource allocation techniques that maximize the performance of a data center when constrained to the total power consumption of the compute servers and Computer Room Air Conditioning (CRAC) units in addition to ensuring servers and CRAC units operate within a redline temperature threshold are designed. As multicore processors increase in number of cores, the effects of shared caches can have a pronounced impact on the execution speed of memory-intensive tasks. In our model, we consider the power consumed by compute servers and CRAC units, a workload with tasks of varying compute and memory intensity that affects the power consumption of cores, and the execution speed degradation effects caused by co-locating tasks to cores within the same multicore processor. The performance of the system is quantified as the total reward earned from completing tasks by their individual deadlines. A novel genetic algorithm technique, in combination with a new local search technique that guarantees the power and thermal constraints, to solve this problem is being designed and analyzed. The plan is to consider other techniques, as well.

Adviser: H.J. Siegel
Co-Adviser: Sudeep Pasricha
Non-ECE Member: Darrell Whitley
Member 3: Anthony A. Maciejewski
Addional Members: N/A

Mark Oxley, Sudeep Pasricha, Howard Jay Siegel, and Anthony A. Maciejewski, "Energy and Deadline Constrained Robust Stochastic Static Resource Allocation," The First Workshop on Power and Energy Aspects of Computation (PEAC 2013), in the proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), to appear, 10 pp., Warsaw, Poland, Sep. 2013.

Program of Study: