Energy Efficient and Robust Management of Datacenters

Parallel and distributed high performance computing (HPC) systems are often a heterogeneous mix of machines. As these systems continue to expand rapidly in capability, driven by the call of exascale and growing demand for cloud computing, their computational energy expenditure has skyrocketed, requiring elaborate cooling facilities to function, which themselves consume significant energy. The need for energy-efficient resource management is thus paramount. Moreover, these systems frequently experience degraded performance and high power consumption due to circumstances that change unpredictably, such as thermal hotspots caused by load imbalances or sudden machine failures. As the complexity of systems grows, so does the importance of making system operation robust against these uncertainties.

The research objective of this project is the design of models, metrics, and algorithmic strategies for deriving resource (e.g., workload, data) allocations that are energy-efficient and robust. The focus is on deriving stochastic robustness and energy models from real-world data; applying these models for resource management strategies that co-optimize performance, robustness, computation energy, and cooling energy; modeling the impact of interference in shared memory and network subsystems; quantifying task and machine heterogeneity; thermal setpoint adaptation to save energy; developing schemes for real-time thermal modeling; defining new metrics to characterize cooling energy costs and capacity; and driving and validating our research based on feedback collected from real-world petaflop systems (Yellowstone at National Center of Atmospheric Research, Titan at Oak Ridge National Lab) and teraflop systems (CSU’s Cray cluster, teraflop cluster at Oak Ridge National Lab). Extensions of this project are exploring the impact of resource management in geo-distributed datacenters. We are also looking into applying game theoretic and machine learning techniques for resource management in geo-distributed datacenters.

Selected Publications

S. Pasricha, N. Hogade, H.J. Seigel, A. A. Maciejewski, “Green Computing with Geo-Distributed Heterogeneous Data Centers,” IEEE International Green and Sustainable Computing Conference (IGSC), Alexandria, VA, USA, Oct. 2019.

N. Hogade, S. Pasricha, A. A. Maciejewski, H.J. Siegel, M. Oxley, E. Jonardi, “Minimizing Energy Costs for Geographically Distributed Heterogeneous Data Centers“, IEEE Transactions on Sustainable Computing (TSUSC), Vol. 3, No. 4, Oct-Dec 2018.

M. Oxley, E. Jonardi, S. Pasricha, H. J. Siegel, T. Maciejewski, P. J. Burns, and G. Koenig “Rate-based Thermal, Power, and Co-location Aware Resource Management for Heterogeneous Data Centers”, Journal of Parallel and Distributed Computing (JPDC), vol. 12, part 2, pp. 126-139, Feb 2018.

D. Machovec, B. Khemka, N. Kumbhare, S. Pasricha, A. A. Maciejewski, H. J. Siegel, A. Akoglu, G. A. Koenig, S. Hariri, C. Tunc, M. Wright, M. Hilton, R. Rambharos, C. Blandin, F. Fargo, A. Louri, N. Imam, “Utility-Based Resource Management in an Oversubscribed Energy-Constrained Heterogeneous Environment Executing Parallel Applications”, Journal of Parallel Computing (PARCO), Nov 2017.

D. Machovec, S. Pasricha, A. A. Maciejewski, H. Jay Siegel, G. A. Koenig, M. Wright, M. Hilton, R. Rambharos, T. Naughton, N. Imam, “Preemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System,” IEEE International Heterogeneity in Computing Workshop (HCW),  co-organized with IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2017.

D. Dauwe, E. Jonardi, R. Friese, S. Pasricha, A. A. Maciejewski, D. Bader, H.J. Siegel, “HPC Node Performance and Energy Modeling Under the Uncertainty of Application Co-Location”, Journal of Supercomputing, Vol. 72, No. 12, pp. 4771-4809, Nov. 2016.

M. Oxley, S. Pasricha, T. Maciejewski, H.J. Siegel and P. Burns, “Online Resource Management in Thermal and Energy Constrained Heterogeneous High Performance Computing,” IEEE International Conference on Big Data Intelligence and Computing (DataCom), Aug 2016.

D. Machovec, B. Khemka, S. Pasricha, A. A. Maciejewski, H. Jay Siegel, G. A. Koenig, M. Wright, M. Hilton, R. Rambharos, N. Imam, “Dynamic Resource Management for Parallel Tasks in an Oversubscribed Energy-Constrained Heterogeneous Environment,” International Heterogeneity in Computing Workshop (HCW) co-located with IEEE International Parallel & Distributed Processing Symposium IPDPS, May 2016.

E. Jonardi, M. Oxley, S. Pasricha, H. J. Siegel and T. Maciejewski, “Energy Cost Optimization for Geographically Distributed Heterogeneous Data Centers,” IEEE Workshop on Energy-efficient Networks of Computers (E2NC): from the Chip to the Cloud, Dec 2015.

B. Khemka, R. Friese, S. Pasricha, A. A. Maciejewski, H. J. Siegel, G. A. Koenig, S. Powers, M. Hilton, R. Rambharos, M. Wright, S. Poole, “Comparison of Energy-Constrained Resource Allocation Heuristics Under Different Task Management Environments,” International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), 2015.

M. Oxley, S. Pasricha, A. A. Maciejewski, H. J. Siegel, J. Apodaca, D. Young, L. Briceno, J. Smith, S. Bahirat, B. Khemka, A. Ramirez and Y. Zou,, “Makespan and Energy Robust Stochastic Static Resource Allocation of Bags-of-Tasks to a Heterogeneous Computing System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol.26, no.10, pp. 2791-2805, Oct. 2015.

D. Dauwe, E. Jonardi, R. Friese, S. Pasricha, A. A. Maciejewski, D. Bader, H.J. Siegel, “A Methodology for Co-Location Aware Application Performance Modeling in Multicore Computing,” 17th Workshop on Workshop on Advances in Parallel and Distributed Computational Models (APDCM), May 2015.

B. Khemka, R. Friese, S. Pasricha, A. A. Maciejewski, H. J. Siegel, G. A. Koenig, S. Powers, M. Hilton, R. Rambharos, and S. Poole, “Utility Maximizing Dynamic Resource Management in an Oversubscribed Energy-Constrained Heterogeneous Computing System”, Journal of Sustainable Computing: Informatics and Systems, 2014, Volume 5, pp. 14–30, March 2015. 

A. M. Al-Qawasmeh, S. Pasricha, A. M. Maciejewski, H. J. Siegel, “Power and Thermal-Aware Workload Allocation in Heterogeneous Data Centers”, IEEE Transactions on Computers, vol. 64, Iss 02, pp. 477-491, Feb 2015.

M. Oxley, E. Jonardi, S. Pasricha, A. A. Maciejewski, G. Koenig and H. J. Siegel “Thermal, Power, and Co-location Aware Resource Allocation in Heterogeneous Computing Systems,” IEEE International Green Computing Conference (IGCC), 2014.

D. Dauwe, R. Friese, S. Pasricha, A. A. Maciejewski, G. A. Koenig, H. J. Siegel, ” Modeling the Effects on Power and Performance from Memory Interference of Co-located Applications in Multicore Systems,” International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), 2014.

H. J. Siegel, B. Khemka, R. Friese, S. Pasricha, A. A. Maciejewski,  G. A. Koenig, S. Powers, M. Hilton, J. Rambharos, G. Okonski, and  S. W. Poole, “Energy-Aware Resource Management for Computing Systems,” 7th International Conference on Contemporary Computing (IC3), Noida, India, Aug. 2014.

B. Khemka, G. A. Koenigz, R, Friese, S. Powers, S. Pasricha, A. A. Maciejewski, M. Hilton, R. Rambharos, H. J. Siegel, S. Poole, “Utility Driven Dynamic Resource Management in an Oversubscribed Energy-Constrained Heterogeneous System”, 23rd International Heterogeneity in Computing Workshop (HCW), May 2014.

M. Oxley, S. Pasricha, H. J. Siegel, and A. Maciejewski, “Energy and Deadline Constrained Robust Stochastic Static Resource Allocation”, Workshop on Power and Energy Aspects of Computation (PEAC) held in conjunction with the 10th International Conference on Parallel Processing  and Applied Mathematics (PPAM), Sep. 2013.

R. Friese, T. Brinks, C. Oliver, A. Maciejewski, H. J. Siegel, S. Pasricha, “A Machine-by-Machine Analysis of a Bi-Objective Resource Allocation Problem”, International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Jul. 2013.

D. Young, J. Smith, S. Pasricha, A. Maciejewski and H. J. Siegel, “Heterogeneous Energy and Makespan Constrained DAG Scheduling”, International Workshop on Energy Efficient High Performance Parallel and Distributed Computing (EEHPDC), Jun. 2013.

A. M. Al-Qawasmeh, S. Pasricha, A. M. Maciejewski, and H. J. Siegel, “Thermal-Aware Performance Optimization in Power Constrained Heterogeneous Data Centers”, 21st International Heterogeneity in Computing Workshop (HCW), 2012.

D. Young, J. Apodaca, L. Briceno, J. Smith, S. Pasricha, A. Maciejewski, H. Siegel, S. Bahirat, B. Khemka, A. Ramirez and Y. Zou, “Deadline and Energy Constrained Dynamic Resource Allocation in a Heterogeneous Computing Environment”, Journal of Supercomputing, 2012.

D. Young, J. Apodaca, L. Briceno, J. Smith, S. Pasricha, A. Maciejewski, H. Siegel, S. Bahirat, B. Khemka, A. Ramirez and Y. Zou, “Energy-Constrained Dynamic Resource Allocation in a Heterogeneous Computing Environment”, Fourth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Sep 2011

J. Apodaca, D. Young, L. Briceno, J. Smith, S. Pasricha, A. Maciejewski, H. Siegel, S. Bahirat, B. Khemka, A. Ramirez and Y. Zou, “Stochastically Robust Static Resource Allocation for Energy Minimization with a Makespan Constraint in a Heterogeneous Computing Environment”, ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), Dec 2011. (Best Paper Award)