Graduate Exam Abstract

Febin Sunny

Ph.D. Final
Apr 26, 2024, 2:30 pm - 4:30 pm
ECE Conference room
HARDWARE-SOFTWARE CO-DESIGN OF SILICON PHOTONIC AI ACCELERATORS

Abstract: Over the last decade, the deployment of artificial intelligence (AI) and machine learning (ML) algorithms has expanded dramatically across a plethora of practical applications, encompassing domains such as intelligent consumer devices, automotive technologies, healthcare systems, cybersecurity mechanisms, and natural language processing tasks. This expansion is primarily driven by the advent of sophisticated deep ML based AI architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Graph Neural Networks (GNNs), and Transformer models. As researchers explore deeper models with higher connectivity and more complex architectures, the computing power and the memory requirement necessary to train and utilize them also increase. Such increasing complexity also necessitates that the underlying hardware platform should consistently deliver better performance while satisfying strict power constraints. However, electronic computing platforms face fundamental limits in the post-Moore Law era due to increased Ohmic losses and capacitance-induced latencies in interconnects, as well as power inefficiencies and reliability concerns that reduce yields and increase costs with semiconductor-technology scaling.
A potential solution to improving performance-per-watt for AI model processing is to architect application-specific neural network (NN) accelerator platforms. Architecting NN accelerator hardware platforms involves designing high speed and energy efficient multiply-and-accumulate (MAC) units, high bandwidth memory and interconnects, and software-aware hardware pipelining and dataflow mapping approaches. Such customized architectures have been shown to outperform general-purpose architectures for NN training and inference. However, due to the increasing inefficiencies in conventional CMOS computing platforms and the slowdown of Dennard scaling in recent years, alternate implementation technologies need to be considered for efficient NN acceleration. Silicon photonics is one of the most promising of these emerging technologies which is being explored for high bandwidth and low latency chip-scale communication and computation. To achieve reliable and efficient photonics-based NN acceleration with high throughput, it is crucial to overcome several challenges. Photonics computation, being analog in nature, is prone to noise interference, which can impact achievable NN model parameter resolution. Compensating for the impact of fabrication-process variations (FPVs)—fluctuations in device dimensions and characteristics due to semiconductor fabrication imperfections—is critical for dependable computation. FPVs cause optical frequency shifts in devices essential for photonic computation and communication, necessitating compensation to prevent crosstalk and data corruption. Furthermore, latency from correcting mechanisms such as thermo-optic tuning and associated thermal crosstalk adversely affects photonic component throughput and reliability. At the architecture level, efficient NN processing requires optimizing photonic resource use, device layout, and MAC unit design and aggregation, while considering diverse NN model computation requirements.
To address these challenges, the major contributions of this thesis are focused on proposing a hardware-software co-design framework to enable high throughput, low latency, and energy-efficient AI acceleration across various NN models, using silicon photonics. The thesis describes cross-layer solutions to achieve application-level goals across various NN model types: (i) At the device level, FPV-resilient microring resonators (MRs) are proposed, to ensure higher reliability in photonic computation; (ii) within the photonic computation units, thermal eigenmode decomposition is adapted to cancel out thermal crosstalk and thereby increase computation reliability; (iii) at the electro-optic control level, a combination of thermal and electric tuning is proposed, to reduce the latency of operation in photonic circuits; (iv) at the architecture level, wavelength reuse schemes, vector decomposition mechanisms, and NN-aware MAC unit designs are proposed for increased energy efficiency (e.g., in laser power consumption) and to realize efficient model-aware hardware pipelining; (v) at the network-on-chip (NoC) level, optical loss-aware approximate communication is proposed to improve efficiency of data movement.
The proposed cross-layer solutions have been combined with enhancements across NN-specific accelerator architectures, resulting in reliable, energy-efficient, and high throughput accelerator designs. Chapter 2 discusses the ARXON framework that proposes using approximate data transfers with the help of cross-layer optimizations to reduce on-chip photonic network energy consumption for general-purpose and NN-specific accelerators while ensuring application quality of service (QoS). The proposed photonic accelerator architectures CrossLight (Chapter 3) and ROBIN (Chapter 4) showcase how the combination of cross-layer techniques can outperform GPUs and electronic NN accelerators for CNNs and binarized CNNs, respectively. In SONIC (Chapter 5), the challenge of accelerating CNNs with unstructured sparsity is addressed, using directly tuned VCSELs. Seeing how the electrical-optical interface, which uses digital-to-analog-converters, are one of the main contributors to the overall power and energy consumption, the HQNNA architecture (Chapter 6) explores combining wavelength-division multiplexing (WDM) and time-division multiplexing (TDM) to dramatically reduce the energy consumption, at the expense of minimal loss in throughput. For accelerating GNNs and Transformers using photonics, GHOST (Chapter 8) and TRON (Chapter 9) architectures are designed, with model-specific pipelines and optimization. In Chapter 10, we explore the scalability of these accelerator architectures to a multi-chiplet, 2.5D platform. The thesis further describes how to realize reliable phase change material (PCM) based main memory as part of the COMET (Chapter 11) architecture, which can function together with photonic NN accelerators. In Chapter 12, we showcase how photonic NN accelerators can be even more aggressively optimized with in-memory computation, to minimize data movement and achieve significant energy and performance improvements. Our findings in this thesis not only help push the boundaries of photonic computing but also open new avenues for the development of energy-efficient computational technologies using photonics. In the final chapter, we conclude the thesis by summarizing our contributions and proposing directions for future research in this promising field.

Adviser: Dr. Sudeep Pasricha
Co-Adviser: Dr. Mahdi Nikdast
Non-ECE Member: Dr. Haonan Chen, ECE
Member 3: Dr. Yashwant Malaiya, CS
Addional Members: N/A

Publications:
1) A. Mirza, F. Sunny, S. Pasricha, M. Nikdast, "Silicon photonic microring resonators: Design optimization under fabrication non-uniformity," IEEE/ACM DATE, 2020
2) F. Sunny, A. Mirza, M. Nikdast, S. Pasricha, "LORAX: Loss-aware approximations for energy-efficient silicon photonic networks-on-chip", ACM GLSVLSI, 2020
3) F. Sunny, A. Mirza, M. Nikdast, S. Pasricha, "ARXON: A Framework for Approximate Communication Over Photonic Networks-on-Chip", IEEE TVLSI, vol. 29, no. 6, 2021
4) F. Sunny, E. Taheri, M. Nikdast, S. Pasricha, "A survey on silicon photonics for deep learning," ACM JETC, vol. 17, no. 4, 2021
5) F. Sunny, A. Mirza, M. Nikdast, S. Pasricha, "ROBIN: A robust optical binary neural network accelerator," ACM TECS, vol. 20, no. 5, 2021
6) A. Mirza, F. Sunny, S. Pasricha, M. Nikdast, "Silicon photonic microring resonators: A comprehensive design-space exploration and optimization under fabrication-process variations," IEEE TCAD, vol. 41, no. 10, 2021
7) F. Sunny, A. Mirza, M. Nikdast, S. Pasricha, "CrossLight: A cross-layer optimized silicon photonic neural network accelerator," IEEE/ACM DAC, 2021
8) F. Sunny, A. Mirza, M. Nikdast, S. Pasricha, "SONIC: A sparse neural network inference accelerator with silicon photonics for energy-efficient deep learning," IEEE/ACM ASP-DAC, 2022
9) V. S. P. Karempudi, F. Sunny, I. G Thakkar, S. V. R. Chittamuru, M. Nikdast, S. Pasricha, "Photonic networks-on-chip employing multilevel signaling: A cross-layer comparative study," ACM JETC, vol. 18, no.3, 2022
10) F. Sunny, M. Nikdast, S. Pasricha, "A silicon photonic accelerator for convolutional neural networks with heterogeneous quantization," ACM GLSVLSI, 2022
11) F. Sunny, M. Nikdast, S. Pasricha, "RecLight: A recurrent neural network accelerator with integrated silicon photonics," IEEE ISVLSI, 2022
12) F. Sunny, E. Taheri, M. Nikdast, S. Pasricha, "Machine learning accelerators in 2.5 d chiplet platforms with silicon photonics," IEEE/ACM DATE, 2023
13) S. Afifi, F. Sunny, M. Nikdast, S. Pasricha, "TRON: transformer neural network acceleration with non-coherent silicon photonics," ACM GLSVLSI, 2023
14) F. Sunny, M. Nikdast, S. Pasricha, "Cross-Layer design for AI acceleration with non-coherent optical computing," ACM GLSVLSI, 2023
15) A. Balasubramaniam, F. Sunny, S. Pasricha, "R-TOSS: A framework for real-time object detection using semi-structured pruning," IEEE/ACM DAC, 2023
16) S. Afifi, F. Sunny, M. Nikdast, S. Pasricha, "GHOST: A Graph Neural Network Accelerator using Silicon Photonics," ACM TECS, vol. 22, no. 5, 2023
17) F. Sunny, A. Shafiee, B. Charbonnier, M. Nikdast, S. Pasricha, "COMET: A Cross-Layer Optimized Optical Phase Change Main Memory Architecture," IEEE/ACM DATE, 2024 [To appear]
18) S. Afifi, F. Sunny, M. Nikdast, S. Pasricha, "Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics," IEEE/ACM DATE, 2024 [To appear]
19) F. Sunny, E. Taheri, M. Nikdast, S. Pasricha, "Silicon Photonic 2.5 D Interposer Networks for Overcoming Communication Bottlenecks in Scale-out Machine Learning Hardware Accelerators," IEEE VTS, 2024 [To appear]

Program of Study:
STAT581A3
ECE561
ECE554
GRAD510
ECE530
ECE656
ECE580B9
ECE580B6

Colorado State University

Electrical and Computer Engineering

Walter Scott, Jr. College of Engineering