RELAX: cross-layer resource management for reliable NoC-based 2D and 3D manycore architectures in the dark silicon era
Date
2019
Authors
Raparti, Venkata Yaswanth, author
Pasricha, Sudeep, advisor
Jayasumana, Anura, committee member
Bohm, Willem, committee member
Kim, Ryan, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Emerging 2D and 3D chip-multiprocessors (CMPs) are facing numerous challenges due to technology scaling that impact their reliability, power dissipation, performance, and security. With growing parallelism in applications and the increasing core counts, traditional resource management frameworks and critical on-chip components such as networks-on-chip (NoC) and memory controllers (MCs) do not scale well to efficiently cope with this new and complex design space of CMP design. Several phenomena are affecting the reliability of CMPs. For instance, device-level phenomena such as (Bias Temperature Instability) BTI and (Electro Migration) EM lead to permanent faults due to aging in CMOS logic and memory cells in computing cores and NoC routers of CMPs. Simultaneously, alpha particle strikes (soft errors) and power supply noise (PSN) impacts lead to transient faults across CMP components. There have been several attempts to address these challenges at the circuit and micro-architectural levels, such as guard-banding and over-provisioning of resources to the CMP. However, with increasing complexity in the architecture of today's CMPs, mechanisms to overcome these challenges at the circuit and microarchitectural levels alone, incur large overheads in power and performance. Hence, there is a need for a system-level solution that utilizes control knobs from different layers and manages the CMP reliability in runtime to efficiently minimize the adverse effects of these failure mechanisms while meeting performance and power constraints. Network-on-chip (NoC) has become the defacto communication fabric in CMP architectures. There are different types of NoC topologies and architectures that are tailored for different CMP platforms based on their communication demands. The most used topology is 2D/3D mesh-based NoC with a deadlock-free turn-model based routing scheme as it has demonstrated to be scaling well with the increasing core count. However, with unprecedented reliability and security challenges in CMP designed at the sub-nanometer technology node, the basic turn-model routing is proved to be inefficient to provide seamless communication between cores and other on-chip components. This demands for a more reliable NoC solution in 2D, and 3D CMPs. Another critical criterion while designing a CMP is NoC throughput and power consumption in CMPs with integrated manycore accelerators. Manycore accelerator platforms operate on thousands of threads with hundreds of thread blocks executing several kernels simultaneously. The core-to-memory data generated in accelerators is very high compared to a traditional CPU processor. This leads to congestion at memory controllers that demands a high bandwidth NoC with high power and area overheads, which is not scalable as a number of cores in the accelerator increases. High volumes of read reply data in manycore accelerator platforms necessitate intelligent memory scheduling along with low latency NoC to resolve the memory bottleneck issue. Mechanisms to overcome these challenges require complex architectures across CMP interconnection fabric that are designed and integrated at various global locations. Unfortunately, such global fabrication of CMP processors makes them vulnerable to security threats due to hardware Trojans that may be inserted in third-party (3PIP) NoCs. We address these issues by designing a cross-layer resource management framework called RELAX that enhances performance and security of NoC-based 2D and 3D CMPs, while meeting a diverse set of platform constraints related to the lifetime of the CMP, dark silicon power, fault tolerance, thermal and real-time application performance. At the OS-level, we have developed several techniques such as lifetime aware application mapping heuristic, adaptive application degree of parallelism (DoP), slack aware checkpointing, and aging aware NoC path allocation. At the system level, we propose dynamic voltage scheduling (DVS), and a low power checkpointing mechanism to meet the dark silicon power and application deadline constraints. At the architectural level, we introduce several novel upgrades to the architectures of NoC routers, memory controllers (MCs), and network interfaces (NIs) to improve the performance of NoC-based CMPs while minimizing the power dissipation and mitigating security threats from hardware Trojans.
Description
Rights Access
Subject
GPGPU
network on chip
resource management
manycore processors
dark silicon
reliability