Hardware-software codesign of silicon photonic AI accelerators
Date
2024
Journal Title
Journal ISSN
Volume Title
Abstract
Machine learning applications have become increasingly prevalent over the past decade across many real-world use cases, from smart consumer electronics to automotive, healthcare, cybersecurity, and language processing. This prevalence has been fueled by the emergence of powerful machine learning models, such as Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). As researchers explore deeper models with higher connectivity, the computing power and the memory requirement necessary to train and utilize them also increase. Such increasing complexity also necessitates that the underlying hardware platform should consistently deliver better performance while satisfying strict power constraints. Unfortunately, the limited performance-per-watt in today's computing platforms – such as general-purpose CPUs, GPUs, and electronic neural network (NN) accelerators – creates significant challenges for the growth of new deep learning and AI applications. These electronic computing platforms face fundamental limits in the post-Moore Law era due to increased ohmic losses and capacitance-induced latencies in interconnects, as well as power inefficiencies and reliability concerns that reduce yields and increase costs with semiconductor-technology scaling. A solution to improving performance-per-watt for AI model processing is to explore more efficient hardware NN accelerator platforms. Silicon photonics has shown promise in terms of achievable energy efficiency and latency for data transfers. It is also possible to use photonic components to perform computation, e.g., matrix-vector multiplication. Such photonics-based AI accelerators can not only address the fan-in and fan-out problem with linear algebra processors, but their operational bandwidth can approach the photodetection rate (typically in the hundreds of GHz), which is orders of magnitude higher than electronic systems today that operate at a clock rate of a few GHz. A solution to the data-movement bottleneck can be the use of silicon photonics technology for photonic networks-on-chip (PNoCs), which can enable ultra-high bandwidth, low latency, and energy-efficient communication. However, to ensure reliable, efficient, and high throughput communication and computation using photonics, several challenges must be addressed first. Photonic computation is performed in the analog domain, which makes it susceptible to various noise sources and drives down the achievable resolution for representing NN model parameters. To increase the reliability of silicon photonic AI accelerators, fabrication-process variation (FPV), which is the change in physical dimensions and characteristics of devices due to imperfections in fabrication, must be addressed. FPVs induce resonant wavelength shifts that need to be compensated, for the microring resonators (MRs), which are the fundamental devices to realize photonic computation and communication in our proposed accelerator architectures, to operate correctly. Without this correction, FPVs will cause increased crosstalk and data corruption during photonic communication and can also lead to errors during photonic computation. Accordingly, the correction for FPVs is an essential part of reliable computation in silicon photonic-based AI accelerators. Even with FPV-resilient silicon photonic devices, the tuning latency incurred by thermo-optic (TO) tuning and the thermal crosstalk it can induce are significant. The latency, which can be in the microsecond range, impacts the overall throughput of the accelerator and the thermal crosstalk impacts its reliable operation. At the architectural level it is also necessary to ensure that the NN processing is done efficiently while making use of the photonic resources in terms of wavelengths, and NN model-aware decisions in terms of device deployment, arrangement, and multiply and accumulate (MAC) unit design have to be performed. To address these challenges, the major contributions of this thesis are focused on proposing a hardware-software co-design framework to enable high throughput, low latency, and energy-efficient AI acceleration across various neural network models, using silicon photonics. At the architectural level, we have proposed wavelength reuse schemes, vector decomposition, and NN-aware MAC unit designs for increased efficiency in laser power consumption. In terms of NN-aware designs, we have proposed layer-specific acceleration units, photonic batch normalization folding, and fine-grained sparse NN acceleration units. To tackle the reliability challenges introduced by FPV, we have performed device-level design-space exploration and optimization to design MRs that are more tolerant to FPVs than the state-of-the-art efforts in this area. We also adapt Thermal Eigen-mode decomposition and have devised various novel techniques to manage thermal and spectral crosstalk sources, allowing our silicon photonic-based AI accelerators to reach up to 16-bit parameter resolution per MR, which enables high accuracy for most NN models.
Description
Rights Access
Subject
computer architecture
inference acceleration
silicon photonics
hardware-software codesign
artificial intelligence
machine learning