Sudeep Pasricha, Researcher Spotlight, ACM SIGDA Newsletter Sep 2018

Hello readers! In this edition of our column, we introduce Prof. Sudeep Pasricha, Monfort Professor and Rockwell-Anderson Professor at Colorado State University. He leads the Embedded Systems and High-Performance Computing (EPiC) Lab. He received his PhD in Computer Science from the University of California at Irvine in 2008. Excerpts from a recent conversation:

1. Can you share with us some of the focus areas your group is working on?

My research lab at Colorado State University is called the Embedded Systems and High Performance Computing (EPiC) Laboratory and our research focus is broadly in the areas of algorithms and architectures for energy-efficient, fault-resilient, real-time, and secure computing, for platforms ranging from Internet of Things (IoT) devices, embedded and cyber-physical systems, mobile computing, and high performance computing (e.g., datacenters, exascale supercomputers). Some specific examples of ongoing projects include: 1) design of silicon photonic chip-scale networks, 2) runtime resource management for manycore 2D/3D chips, 3) 3D DRAM and non-volatile memory design, 4) embedded systems for autonomous vehicles, 5) indoor localization with smartphones, and 6) fault-resilient exascale computing.

2. With the adoption of silicon photonics at both the network level and chip level to meet increasing demand for bandwidth and performance, reliability of photonic subsystems becomes a crucial factor. Wavelength sensitivity induced by process variation and/or operating conditions can affect functional requirements. Could you share your insights on this problem?

Resilience to process and thermal variations is indeed one of the biggest challenges to the viability of silicon photonics at the chip-scale. To better understand this phenomenon and its implications, it is important to first comprehend some basics. A typical CMOS-compatible fabricated silicon photonic interconnect is much more complex than an electrical wire, and consists of various components: laser power sources that generate optical signals, microring resonator (MR) modulators to convert electrical signals into optical signals, couplers and splitters to distribute optical signals to waveguides, waveguides to route optical signals, and MR filter receivers that detect and drop optical signals on to a photodetector to recover an electrical signal. One of the most attractive features of photonic interconnects at the chip-scale is their ability to allow multiple (e.g., 32) wavelengths of light to be simultaneously transmitted in a single waveguide, where each wavelength can be considered analogous to a single electrical wire. Such wavelength division multiplexing (WDM) enables multiple data channels per waveguide, providing several orders higher bandwidth density than what is possible today with electrical (copper) links. However, process variations (e.g., variations in the dimensions of MRs) and thermal variations (e.g., due to power dissipation when applications execute on the chip) cause an MR device, which is typically designed to couple with a specific wavelength, to no longer couple with it. As you can imagine, this can lead to unpredictable bit-flips and failures during data transmission. Correction for such variations is possible, via charge injection or thermal tuning at the MR device level, but entails very high power overheads. Our research lab has pioneered the use of cross-layer optimization techniques to overcome the impact of such variations. Over the past decade, we have characterized and proposed mitigation strategies for various forms of reliability issues in silicon photonics devices due to process and thermal variations, aging, and crosstalk. Our experience has shown that cross-layer techniques that combine coordinated enhancements at multiple levels (device, circuit, architecture, and OS/system) achieve the most bang for the buck when it comes to low overhead variation management.

3. Power Distribution Network (PDN) requirements can vary significantly based on configuration modes in multi-core chips and programmable chips. Availability of early-phase tools to evaluate mode-specific requirements can help influence product cost and even marketability. Typically, what is the demand-range on the PDN over such modes? In your view, is there an optimal range, to avoid pessimistic over-designing?

Given the huge diversity of application and platform configuration modes that are possible on any multi-core chip, it is difficult to estimate demand and optimal ranges that are meaningful across chip architectures. What is clear however is that today’s processors designed at sub-10nm technology nodes have high device densities and fast switching frequencies that cause fluctuations in supply voltage and ground networks, which can adversely affect the execution of applications running on them. For instance, the peak power supply noise (PSN) in the PDN due to inter-core activity interference can be as high as 80% of the nominal near threshold supply voltage at 7nm. This PSN is a serious concern as it introduces timing errors, not just in processing cores, but also the on-chip network components (e.g., routers) that often switch frequently. While traditional solutions attempt to address this issue at the circuit and micro-architecture levels, our research has shown that the compute intensity and distribution of the workload on the cores decides the magnitude of PSN observed, motivating attacking this problem at a higher abstraction. Our research lab has devised several early design space exploration tools that we believe are very promising in addressing PSN in PDNs, by making informed design decisions at both design-time and at run-time. Interestingly, the real power of these early design tools is to address not just the PSN issue, but also a host of other key design concerns, such as soft errors, aging, dark silicon budgets, and application QoS demands.

4. With emerging alternatives for memory technology offering different features and capability, developing design tools can be a challenge for academic researchers and open source contributors. Can you share your thoughts on tool development to keep pace with such multiple technologies and still offer depth for each such technology?

It seems like we are in the golden age of memory technologies, with so many emerging options that are no longer just hypothetical concepts. Open-source models and tools that can allow for design space exploration of 3D DRAMs, memristors, phase change memories, spin transfer torque memories, etc will play a vital role in furthering the proliferation of these technologies. The current crop of tools, e.g., NVMain/NVSim, DRAMSim-2, etc. are a good starting point, but may not be suitable for all purposes. For instance, in my research lab we have developed several new 3D DRAM architectures and DRAM refresh strategies. Our work with open-source DRAM tools found that they are deficient in modeling aspects several key design aspects, e.g., the PDN, which can have a profound impact on the accuracy of any results. Therefore researchers must be careful not to overreach and be careful when adapting these tools for their specific use-cases. One last thing I would emphasize is that the tools for memory design and exploration need to be regularly informed with data based on real prototypes. This is a big challenge as most semiconductor companies are not as open to sharing proprietary information about their R&D projects with memory design. It might seem counterintuitive from a competitiveness point of view, but making at least some of the technological data for memory prototypes freely available will help a technology to spread more rapidly, which can help create new markets for emerging products.

5. With the emergence of autonomous driving, functional safety requirements for electronic components have become more stringent than ever before and almost a limitation to academic research in this area. For industry to gain from academic research, what do you think are the requirements for a successful partnership particularly in this domain.

It is certainly true that functional safety requirements for electronic components in the automotive domain have become more stringent than ever before. That is not an undesirable development, but can be challenging for researchers that are hoping to integrate their ideas into real vehicles and impact actual drive cycles. A partnership with the automotive industry is important for the purposes of deployment testing and validation, but not a requirement for industry to gain from academic research. In my research lab we are actively researching the design of real-time, secure, and jitter-resilient automotive networks, hardware/software co-design for electronic control units (ECUs), and the design of deep learning-based advanced driver assistance systems (ADAS) for tracking pedestrians, vehicles, traffic lights, lanes, and traffic signs. Much of this research has utilized off-the-shelf automotive components and commodity vehicles for testing and validation, without reliance on industry collaboration. Undoubtedly, closer industry involvement can take any research to the next level. For instance, a recently concluded project between my research lab and Fiat/Chrysler allowed us to deploy our embedded software optimization techniques in real powertrain controllers in their test vehicles, which we would not be brave enough to do with our own vehicles! At Colorado State University we are also fortunate enough to have significant infrastructure support for automotive research that has allowed us to work with modifying the internals of real vehicles as part of GM and DOE’s EcoCAR2 and EcoCAR3 projects over the past decade. This has allowed us to integrate ADAS systems and new vehicle control strategies that we developed in my lab into real vehicles. Coming back to the question of what such industry-academic partnerships should look like, I would say that first and foremost there needs to be a commitment from universities to provide the substantial infrastructure required for non-trivial research in the automotive domain. This gives automotive companies the confidence to engage with academic partners. Beyond that, it is up to the individual researchers to make a viable case and value proposition for any industry collaboration.