Browsing by Author "Pasricha, Sudeep, author"

Now showing 1 - 12 of 12

Open Access
Cross-layer design for AI acceleration with non-coherent optical computing
(Colorado State University. Libraries, 2023-06-05) Sunny, Febin, author; Nikdast, Mahdi, author; Pasricha, Sudeep, author; ACM, publisher
Emerging AI applications such as ChatGPT, graph convolutional networks, and other deep neural networks require massive computational resources for training and inference. Contemporary computing platforms such as CPUs, GPUs, and TPUs are struggling to keep up with the demands of these AI applications. Non-coherent optical computing represents a promising approach for light-speed acceleration of AI workloads. In this paper, we show how cross-layer design can overcome challenges in non-coherent optical computing platforms. We describe approaches for optical device engineering, tuning circuit enhancements, and architectural innovations to adapt optical computing to a variety of AI workloads. We also discuss techniques for hardware/ software co-design that can intelligently map and adapt AI software to improve performance on non-coherent platforms.
Open Access
Design space exploration for PCM-based photonic memory
(Colorado State University. Libraries, 2023-06-05) Shafiee, Amin, author; Charbonnier, Benoit, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisher
The integration of silicon photonics (SiPh) and phase change materials (PCMs) has created a unique opportunity to realize adaptable and reconfigurable photonic systems. In particular, the nonvolatile programmability in PCMs has made them a promising candidate for implementing optical memory systems. In this paper, we describe the design of an optical memory cell based on PCMs while exploring the design space of the cell in terms of PCM material choice (e.g., GST, GSST, Sb2Se3), cell bit capacity, latency, and power consumption. Leveraging this design-space exploration for the design of efficient optical memory cells, we present the design and implementation of an optical memory array and explore its scalability and power consumption when using different optical memory cells. We also identify performance bottlenecks that need to be alleviated to further scale optical memory arrays with competitive latency and energy consumption, compared to their electronic counterparts.
Open Access
Ethics in computing education: challenges and experience with embedded ethics
(Colorado State University. Libraries, 2023-06-05) Pasricha, Sudeep, author; ACM, publisher
The next generation of computer engineers and scientists must be proficient in not just the technical knowledge required to analyze, optimize, and create computing systems, but also with the skills required to make ethical decisions during design. Teaching computer ethics in computing curricula is therefore becoming an important requirement with significant ramifications for our increasingly connected and computing-reliant society. In this paper, we reflect on the many challenges and questions with effectively integrating ethics into modern computing curricula. We describe a case study of integrating ethics modules into the computer engineering curricula at Colorado State University.
Open Access
FedHIL: heterogeneity resilient federated learning for robust indoor localization with mobile devices
(Colorado State University. Libraries, 2023-09-09) Gufran, Danish, author; Pasricha, Sudeep, author; ACM, publisher
Indoor localization plays a vital role in applications such as emergency response, warehouse management, and augmented reality experiences. By deploying machine learning (ML) based indoor localization frameworks on their mobile devices, users can localize themselves in a variety of indoor and subterranean environments. However, achieving accurate indoor localization can be challenging due to heterogeneity in the hardware and software stacks of mobile devices, which can result in inconsistent and inaccurate location estimates. Traditional ML models also heavily rely on initial training data, making them vulnerable to degradation in performance with dynamic changes across indoor environments. To address the challenges due to device heterogeneity and lack of adaptivity, we propose a novel embedded ML framework called FedHIL. Our framework combines indoor localization and federated learning (FL) to improve indoor localization accuracy in device-heterogeneous environments while also preserving user data privacy. FedHIL integrates a domain-specific selective weight adjustment approach to preserve the ML model's performance for indoor localization during FL, even in the presence of extremely noisy data. Experimental evaluations in diverse real-world indoor environments and with heterogeneous mobile devices show that FedHIL outperforms state-of-the-art FL and non-FL indoor localization frameworks. FedHIL is able to achieve 1.62 × better localization accuracy on average than the best performing FL-based indoor localization framework from prior work.
Open Access
GHOST: a graph neural network accelerator using silicon photonics
(Colorado State University. Libraries, 2023-09-09) Afifi, Salma, author; Sunny, Febin, author; Shafiee, Amin, author; Nikdast, Mahdi, author; Pasricha, Sudeep, author; ACM, publisher
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2 × better throughput and 3.8 × better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.
Open Access
Improving block management in 3D NAND flash SSDs with sub-block first write sequencing
(Colorado State University. Libraries, 2024-06-12) Buddhanoy, Matchima, author; Khan, Kamil, author; Milenkovic, Aleksandar, author; Pasricha, Sudeep, author; Ray, Biswajit, author; ACM, publisher
Continual vertical scaling in 3D NAND flash solid-state drives (SSDs) results in larger memory blocks, causing performance degradation due to big-block management issues. Pages within a 3D NAND flash block are traditionally written using layer first write sequencing. This paper introduces and explores the benefits of an alternative sub-block first write sequence. This method when coupled with sub-block erase operations promises to alleviate the big-block problem. Our evaluation on a commercial 32-layer 3D NAND flash SSD chip shows that though the proposed method increases the raw bit error rate (RBER), it remains below the threshold that can be corrected by error correction codes (ECCs). Simulation analysis further shows that our proposed method reduces garbage collection overhead, resulting in 36.0% lower response time and 9.6% reduction in additional writes due to garbage collection compared to traditional 3D NAND flash SSDs.
Open Access
Invited paper: Bridging EDA and silicon photonics design: enabling robust-by-design photonic integrated circuits
(Colorado State University. Libraries, 2025-03-04) Ghanaatian, Zahra, author; Mirza, Asif, author; Shafiee, Amin, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisher
Silicon photonic devices are essential components of integrated optical communication systems and emerging photonic processors. However, their performance is notably impacted by fabrication-process variations (FPVs), which primarily stem from optical lithography imperfections. The impact of FPVs can accumulate and deteriorate the system-level performance through, for example, increasing system power consumption, accumulated crosstalk noise, and degrading signal integrity in photonic systems. In this paper, we discuss the promise of variation-aware design-space exploration and optimization to enhance photonic device robustness under different FPVs while considering two silicon photonic devices used widely in different applications, namely Microring Resonators (MRRs) and Mach-Zehnder Interferometers (MZIs). In addition, we consider a system-level case study of an MZI-based coherent neural network, where we show how our proposed variation-aware design optimization at the device level helps improve the network accuracy by up to 88% under FPVs.
Open Access
Life-after-death: exploring thermal annealing conditions to enhance 3D NAND SSD endurance
(Colorado State University. Libraries, 2024-07-08) Buddhanoy, Matchima, author; Pasricha, Sudeep, author; Ray, Biswajit, author; ACM, publisher
In this paper, we evaluate thermal annealing effects on the endurance of commercial off-the-shelf (COTS) 3D NAND flash memory beyond its end-of-life. We systematically evaluate the effects of anneal duration, anneal temperature, and state of the memory cells during annealing on the endurance enhancement. Interestingly, we find that endurance enhancement critically depends on the state of flash memory cells during annealing, with programmed cells showing significantly larger improvements than erased cells. Our experimental evaluation indicates that the post-cycle data retention property of an annealed chip significantly improves after thermal annealing, resulting in ∼30% endurance recovery. Our results have significant implications for the future wear-leveling algorithms of SSD-based storage systems.
Open Access
SCRIPT: a multi-objective routing framework for securing chiplet systems against distributed DoS attacks
(Colorado State University. Libraries, 2024-06-12) Taheri, Ebadollah, author; Aghanoury, Pooya, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; Sehatbakhsh, Nader, author; ACM, publisher
Heterogeneous 2.5D integration enables seamless integration of chiplets, hence reducing design time and costs. Concerns arise when dealing with untrustworthy chiplets, emphasizing the need for dependable Network-on-Interposer (NoI). This paper introduces SCRIPT, a secure routing framework to mitigate Distributed Denial-of-Service (DDoS) attacks in chiplet systems. SCRIPT obscures predictable paths exploited by attackers, disrupting orchestrated attacks. SCRIPT considers chiplet trust and criticality and employs a multi-objective optimization technique to enhance NoI performance and reliability. Evaluations show that SCRIPT enhances NoI security by at least 64% against DDoS attacks.
Open Access
SHIELD: sustainable hybrid evolutionary learning framework for carbon, wastewater, and energy-aware data center management
(Colorado State University. Libraries, 2024-05-09) Qi, Sirui, author; Milojicic, Dejan, author; Bash, Cullen, author; Pasricha, Sudeep, author; ACM, publisher
Today's cloud data centers are often distributed geographically to provide robust data services. But these geo-distributed data centers (GDDCs) have a significant associated environmental impact due to their increasing carbon emissions and water usage, which needs to be curtailed. Moreover, the energy costs of operating these data centers continue to rise. This paper proposes a novel framework to co-optimize carbon emissions, water footprint, and energy costs of GDDCs, using a hybrid workload management framework called SHIELD that integrates machine learning guided local search with a decomposition-based evolutionary algorithm. Our framework considers geographical factors and time-based differences in power generation/use, costs, and environmental impacts to intelligently manage workload distribution across GDDCs and data center operation. Experimental results show that SHIELD can realize 34.4× speedup and 2.1× improvement in Pareto Hypervolume while reducing the carbon footprint by up to 3.7×, water footprint by up to 1.8×, energy costs by up to 1.3×, and a cumulative improvement across all objectives (carbon, water, cost) of up to 4.8× compared to the state-of-the-art.
Open Access
TRINE: a tree-based silicon photonic interposer network for energy-efficient 2.5D machine learning acceleration
(Colorado State University. Libraries, 2023-10-28) Taheri, Ebadollah, author; Mahdian, Mohammad Amin, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisher
2.5D chiplet systems have showcased low manufacturing costs and modular designs for machine learning (ML) acceleration. Nevertheless, communication challenges arise from chiplet interconnectivity and high-bandwidth demands among chiplets. To address these challenges, we present TRINE, a novel tree-based silicon photonic interposer network for energy-efficient ML acceleration. Leveraging silicon photonics and broadband optical switching, TRINE enables efficient inter-chiplet communication with reduced latency and improved energy efficiency. Considering several ML workloads, our simulation results demonstrate significant improvements in the average energy efficiency by 61.7% and 40% when comparing TRINE with two recently proposed silicon photonic interposer networks. By overcoming communication limitations in 2.5D ML accelerators, this work is a promising step towards advancing 2.5D photonic-based ML accelerator design.
Open Access
TRON: transformer neural network acceleration with non-coherent silicon photonics
(Colorado State University. Libraries, 2023-06-05) Afifi, Salma, author; Sunny Febin, author; Nikdast, Mahdi, author; Pasricha, Sudeep, author; ACM, publisher
Transformer neural networks are rapidly being integrated into state-of-the-art solutions for natural language processing (NLP) and computer vision. However, the complex structure of these models creates challenges for accelerating their execution on conventional electronic platforms. We propose the first silicon photonic hardware neural network accelerator called TRON for transformer-based models such as BERT, and Vision Transformers. Our analysis demonstrates that TRON exhibits at least 14× better throughput and 8× better energy efficiency, in comparison to state-of-the-art transformer accelerators.