Kamil Khan
Ph.D. PreliminaryMay 02, 2025, 12:00 pm - 1:45 pm
ECE Conference Room ENGR C101B
Reinforcement Learning for On-chip and Off-chip Adaptive Resource Management For High-Performance Computing
Abstract: Modern computing systems face unprecedented resource management challenges as architectures evolve toward many-core designs, heterogeneous components, and disaggregated resources. Both on-chip networks (NoCs) and off-chip memory/storage systems suffer from inefficiencies when managed using traditional algorithmic approaches such as design-time optimizations and heuristics. Static design-time optimizations cannot adapt to dynamic workloads, while reactive heuristics make myopic decisions without considering long-term consequences or system-wide effects. These limitations result in performance bottlenecks, wasted energy, and underutilized resources across the computing stack.
This dissertation proposes reinforcement learning (RL) as a comprehensive framework for adaptive resource management in high-performance computing systems. Unlike conventional approaches, RL optimizes sequences of decisions to maximize long-term returns rather than greedily optimizing individual actions. The Markov Decision Process (MDP) formalism provides a flexible mathematical foundation that can model diverse resource management problems while capturing their sequential decision-making nature. This flexibility enables application across domains—from buffer management and routing in NoCs to garbage collection in SSDs and memory allocation in disaggregated systems—with consistent optimization principles despite domain-specific challenges.
Effectively applying RL to resource management presents several technical challenges that this research addresses. The high-dimensional state spaces and delayed feedback in computing systems complicate the learning process. Our on-chip policy solutions incorporate broader awareness to learn congestion patterns beyond single paths, carefully crafted scalar reward functions that encode complex system objectives, and mechanisms for effective agent coordination in multi-agent environments. Similarly, our off-chip policy design leverages RL’s inherent planning mechanism to minimize overheads in flash storage, and provide a framework for efficient memory allocation and routing in disaggregated memory architectures.
The primary contributions of this work include three novel RL frameworks for on-chip resource management: RACE for NoC buffer control (improving latency by 48.9% and reducing energy by 47.1%), Q-RASP for efficient routing (enhancing latency by 18.3% while decreasing energy by 6.7%), and CAFEEN for adaptive power management (reducing energy consumption by 2.60×-4.37× compared to state-of-the-art approaches). We extend these principles to off-chip domains through RL-based approaches for storage garbage collection optimization and disaggregated memory management. Our methodological innovations include regional awareness techniques that capture spatial congestion patterns, novel reward formulations aligned with system-level objectives, experience sharing mechanisms that accelerate learning, multi-objective optimization of energy and performance through sophisticated reward engineering, and combined routing and memory allocation strategies for end-to-end performance improvement in CXL fabrics. Together, these contributions demonstrate the significant potential of reinforcement learning to address resource management challenges across the computing hierarchy.
This dissertation proposes reinforcement learning (RL) as a comprehensive framework for adaptive resource management in high-performance computing systems. Unlike conventional approaches, RL optimizes sequences of decisions to maximize long-term returns rather than greedily optimizing individual actions. The Markov Decision Process (MDP) formalism provides a flexible mathematical foundation that can model diverse resource management problems while capturing their sequential decision-making nature. This flexibility enables application across domains—from buffer management and routing in NoCs to garbage collection in SSDs and memory allocation in disaggregated systems—with consistent optimization principles despite domain-specific challenges.
Effectively applying RL to resource management presents several technical challenges that this research addresses. The high-dimensional state spaces and delayed feedback in computing systems complicate the learning process. Our on-chip policy solutions incorporate broader awareness to learn congestion patterns beyond single paths, carefully crafted scalar reward functions that encode complex system objectives, and mechanisms for effective agent coordination in multi-agent environments. Similarly, our off-chip policy design leverages RL’s inherent planning mechanism to minimize overheads in flash storage, and provide a framework for efficient memory allocation and routing in disaggregated memory architectures.
The primary contributions of this work include three novel RL frameworks for on-chip resource management: RACE for NoC buffer control (improving latency by 48.9% and reducing energy by 47.1%), Q-RASP for efficient routing (enhancing latency by 18.3% while decreasing energy by 6.7%), and CAFEEN for adaptive power management (reducing energy consumption by 2.60×-4.37× compared to state-of-the-art approaches). We extend these principles to off-chip domains through RL-based approaches for storage garbage collection optimization and disaggregated memory management. Our methodological innovations include regional awareness techniques that capture spatial congestion patterns, novel reward formulations aligned with system-level objectives, experience sharing mechanisms that accelerate learning, multi-objective optimization of energy and performance through sophisticated reward engineering, and combined routing and memory allocation strategies for end-to-end performance improvement in CXL fabrics. Together, these contributions demonstrate the significant potential of reinforcement learning to address resource management challenges across the computing hierarchy.
Adviser: Sudeep Pasricha
Co-Adviser: N/A
Non-ECE Member: Dr. Sanjay Rajopadhye
Member 3: Dr. Edwin Chong
Addional Members: Dr. Anura Jayasumana
Co-Adviser: N/A
Non-ECE Member: Dr. Sanjay Rajopadhye
Member 3: Dr. Edwin Chong
Addional Members: Dr. Anura Jayasumana
Publications:
K. Khan, S. Pasricha, and R. G. Kim, "RACE: A Reinforcement Learning Framework for Improved Adaptive Control of NoC Channel Buffers," in GLSVLSI '22, 2022
K. Khan and S. Pasricha, "A Reinforcement Learning Framework With Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-Chip," in IEEE Design & Test, 2023
K. Khan and S. Pasricha, "CAFEEN: A Cooperative Approach for Energy Efficient NoCs with Multi-Agent Reinforcement Learning," submitted to Design Automation Conference (DAC) '24, 2024
M. Buddhanoy, K. Khan, A. Milenkovic, S. Pasricha, and B. Ray, "Improving Block Management in 3D NAND Flash SSDs with Sub-Block First Write Sequencing," in GLSVLSI '24, 2024.
K. Khan, S. Pasricha, and R. G. Kim, "A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures," J. Low Power Electron. Appl., 2020
K. Khan, S. Pasricha, and R. G. Kim, "RACE: A Reinforcement Learning Framework for Improved Adaptive Control of NoC Channel Buffers," in GLSVLSI '22, 2022
K. Khan and S. Pasricha, "A Reinforcement Learning Framework With Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-Chip," in IEEE Design & Test, 2023
K. Khan and S. Pasricha, "CAFEEN: A Cooperative Approach for Energy Efficient NoCs with Multi-Agent Reinforcement Learning," submitted to Design Automation Conference (DAC) '24, 2024
M. Buddhanoy, K. Khan, A. Milenkovic, S. Pasricha, and B. Ray, "Improving Block Management in 3D NAND Flash SSDs with Sub-Block First Write Sequencing," in GLSVLSI '24, 2024.
K. Khan, S. Pasricha, and R. G. Kim, "A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures," J. Low Power Electron. Appl., 2020
Program of Study:
ECE-561-001
ECE-580B9-001
ECE-554-001
GRAD-550-001
CS-545-801
ECE-656-001
ECE-581C1-001
ECE-575-L02
ECE-561-001
ECE-580B9-001
ECE-554-001
GRAD-550-001
CS-545-801
ECE-656-001
ECE-581C1-001
ECE-575-L02