Publications
Permanent URI for this collectionhttps://hdl.handle.net/10217/100413
Browse
Recent Submissions
Item Open Access Page-overwrite data sanitization in 3D NAND flash: challenges, feasibility, and the PULSE solution(Colorado State University. Libraries, 2025-10-01) Buddhanoy, Matchima, author; Milenkovic, Aleksandar, author; Pasricha, Sudeep, author; Ray, Biswajit, author; ACM, publisherInstant data deletion (or sanitization) in NAND flash devices is essential for achieving data privacy, but it remains challenging due to the mismatch between erase and write granularities, which leads to high overhead and accelerated wear. While page-overwrite-based instant data sanitization has proven effective for 2D NAND, its applicability to 3D NAND is limited due to the unique sub-block architecture. In this study, we experimentally evaluate page-overwrite-based sanitization on commercial 3D NAND flash memory chips and uncover significant threshold voltage disturbances in erased cells on adjacent pages within the same layer but across different sub-blocks. Our key findings reveal that page-overwrite sanitization increases the median raw bit error rate (RBER) beyond correction limits (exceeding 0.93%) in Floating-Gate (FG) Single-Level Cell (SLC) technology, whereas Charge-Trap (CT) SLC 3D NAND flash memories exhibit higher robustness. In Triple-Level Cell (TLC) 3D NAND, page-overwrite sanitization proves impractical, with the median RBER of ∼13% for FG and ∼5% for CT devices. To overcome these challenges, we propose PULSE, a low-disturbance sanitization technique that balances sanitization efficiency (ηsan) and data integrity (RBER). Experimental results show that PULSE eliminates RBER increases in SLC devices and reduces the median RBER to below 0.57% for FG and 0.79% for CT in fresh TLC blocks, demonstrating its practical viability for 3D NAND flash sanitization.Item Open Access GATE: graph attention neural networks with real-time edge construction for robust indoor localization using mobile embedded devices(Colorado State University. Libraries, 2025-10-01) Gufran, Danish, author; Pasricha, Sudeep, author; ACM, publisherAccurate indoor localization is crucial for enabling spatial context in smart environments and navigation systems. Wi-Fi Received Signal Strength (RSS) fingerprinting is a widely used indoor localization approach due to its compatibility with mobile embedded devices. Deep Learning (DL) models improve accuracy in localization tasks by learning RSS variations across locations, but they assume fingerprint vectors exist in a Euclidean space, failing to incorporate spatial relationships and the non-uniform distribution of real-world RSS noise. This results in poor generalization across heterogeneous mobile devices, where variations in hardware and signal processing distort RSS readings. Graph Neural Networks (GNNs) can improve upon conventional DL models by encoding indoor locations as nodes and modeling their spatial and signal relationships as edges. However, GNNs struggle with non-Euclidean noise distributions and suffer from the GNN blind spot problem, leading to degraded accuracy in environments with dense access points (APs). To address these challenges, we propose GATE, a novel framework that constructs an adaptive graph representation of fingerprint vectors while preserving an indoor state-space topology, modeling the non-Euclidean structure of RSS noise to mitigate environmental noise and address device heterogeneity. GATE introduces (1) a novel Attention Hyperspace Vector (AHV) for enhanced message passing, (2) a novel Multi-Dimensional Hyperspace Vector (MDHV) to mitigate the GNN blind spot, and (3) a new Real-Time Edge Construction (RTEC) approach for dynamic graph adaptation. Extensive real-world evaluations across multiple indoor spaces with varying path lengths, AP densities, and heterogeneous devices demonstrate that GATE achieves 1.6 × to 4.72 × lower mean localization errors and 1.85 × to 4.57 × lower worst-case errors compared with state-of-the-art indoor localization frameworks.Item Open Access ASTRA: a stochastic transformer neural network accelerator with silicon photonics(Colorado State University. Libraries, 2025-09-07) Afifi, Salma, author; Alo, Oluwaseun, author; Thakkar, Ishan, author; Pasricha, Sudeep, author; ACM, publisherTransformers have emerged as a dominant architecture in deep learning, demonstrating unparalleled success across a wide range of applications, including natural language processing (NLP), computer vision (CV), and scientific computing. By leveraging the self-attention mechanism, transformers achieve superior performance over traditional models such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs). However, these performance gains come at a cost—high computational complexity and substantial memory requirements, making transformers particularly challenging to deploy efficiently on conventional hardware. To address the increasingly intensive computational demands of attention-based transformers, there is growing interest in developing efficient and high-speed hardware accelerators. Silicon photonics has emerged as a promising alternative to digital electronics, offering high-bandwidth and low-latency computation while improving overall computational and energy efficiency. This work introduces ASTRA, the first optical hardware accelerator that leverages stochastic computing principles for transformer neural networks. ASTRA incorporates novel full-range optical stochastic multipliers and stochastic-analog compute-capable optical-to-electrical transducer units to efficiently handle both static and dynamic tensor computations in attention-based models. Through detailed performance analysis, we demonstrate that ASTRA achieves at least 7.6 x speedup and 1.3 x lower energy consumption compared to state-of-the-art transformer accelerators.Item Open Access HPC digital twins for evaluating scheduling policies, incentive structures and their impact on power and cooling(Colorado State University. Libraries, 2025-11-16) Maiterth, Matthias, author; Brewer, Wesley H., author; Kuruvella, Jaya S., author; Dey, Arunavo, author; Islam, Tanzima Z., author; Kabir, Rashadul, author; Menear, Kevin, author; Duplyakin, Dmitry, author; Patki, Tapasya, author; Jones, Terry, author; Wang, Feiyi, author; ACM, publisherSchedulers are critical for optimal resource utilization in high-performance computing. Traditional methods to evaluate schedulers are limited to post-deployment analysis, or simulators, which do not model associated infrastructure. In this work, we present the first-of-its-kind integration of scheduling and digital twins in HPC. This enables what-if studies to understand the impact of parameter configurations and scheduling decisions on the physical assets, even before deployment, or regarching changes not easily realizable in production. We (1) provide the first digital twin framework extended with scheduling capabilities, (2) integrate various top-tier HPC systems given their publicly available datasets, (3) implement extensions to integrate external scheduling simulators. Finally, we show how to (4) implement and evaluate incentive structures, as-well-as (5) evaluate machine learning based scheduling, in such novel digital-twin based meta-framework to prototype scheduling. Our work enables what-if scenarios of HPC systems to evaluate sustainability, and the impact on the simulated system.Item Open Access Event-driven spatiotemporal processing-in-sensor with phase change memory-based optical acceleration(Colorado State University. Libraries, 2025-06-30) Morsali, Mehrdad, author; Najafi, Deniz, author; Shafiee, Amin, author; Tabrizchi, Sepehr, author; Mercati, Pietro, author; Imani, Mohsen, author; Roohi, Arman, author; Khoshavi, Navid, author; Nikdast, Mahdi, author; Angizi, Shaahin, author; ACM, publisherThis work introduces a novel hybrid electronic-optical processing-in-sensor architecture designed for low-cost, real-time frame processing at the edge. The proposed system enables event detection and integrates a TinyLSTM-based temporal inference model to analyze multiple frames in real time, extracting meaningful spatiotemporal features that trigger an address actuator for region-of-interest selection. By selectively reading out only relevant pixel regions, the architecture significantly reduces data transfer overhead and power consumption. Additionally, it harnesses the efficiency of silicon photonic (SiPh) devices to enable adaptive frame compression techniques and perform convolution operations through intrinsic, conversion-free multiply-accumulate computations. Device-to-architecture simulation results demonstrate 11.2x improvement in performance compared to the state-of-the-art SiPh accelerator achieving 37 KFPS/W. This marks a significant advancement in processing-in-sensor technology, enhancing both computational efficiency and energy savings for edge AI applications.Item Open Access Sustainable carbon-aware and water-efficient LLM scheduling in geo-distributed cloud datacenters(Colorado State University. Libraries, 2025-06-29) Moore, Hayden, author; Qi, Sirui, author; Hogade, Ninad, author; Milojicic, Dejan, author; Bash, Cullen, author; Pasricha, Sudeep, author; ACM, publisherIn recent years, Large Language Models (LLM) such as ChatGPT, Copilot, and Gemini have been widely adopted in different areas. As the use of LLMs continues to grow, many efforts have focused on reducing the massive training overheads of these models. But it is the environmental impact of handling user requests to LLMs that is increasingly becoming a concern. Recent studies estimate that the costs of operating LLMs in their inference phase can exceed training costs by 25× per year. As LLMs are queried incessantly, the cumulative carbon footprint for the operational phase has been shown to far exceed the footprint during the training phase. Further, estimates indicate that 500 ml of fresh water is expended for every 20-50 requests to LLMs during inference. To address these important sustainability issues with LLMs, we propose a novel framework called SLIT to co-optimize LLM quality of service (time-to-first token), carbon emissions, water usage, and energy costs. The framework utilizes a machine learning (ML) based metaheuristic to enhance the sustainability of LLM hosting across geo-distributed cloud datacenters. Such a framework will become increasingly vital as LLMs proliferate.Item Open Access A light-speed large language model accelerator with optical stochastic computing(Colorado State University. Libraries, 2025-06-29) Afifi, Salma, author; Alo, Oluwaseun, author; Thakkar, Ishan, author; Pasricha, Sudeep, author; ACM, publisherTo address the increasingly intensive computational demands of attention-based large language models (LLMs), there is a growing interest in developing energy-efficient and high-speed hardware accelerators. To that end, photonics is being considered as an alternative technology to digital electronics. This work introduces a novel optical hardware accelerator that leverages stochastic computing principles for LLMs. Our proposed accelerator incorporates full-range optical stochastic multipliers and stochastic-analog compute-capable optical-to-electrical transducer units to efficiently handle static and dynamic tensor computations in attention-based models. Our analysis shows that our accelerator exhibits at least 7.6× speedup and 1.3× lower energy compared to state-of-the-art LLMs hardware accelerators.Item Open Access Shedding light on LLMs: harnessing photonic neural networks for accelerating LLMs(Colorado State University. Libraries, 2025-04-09) Afifi, Salma, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisherLarge language models (LLMs) are foundational to the advancement of state-of-the-art natural language processing (NLP) and computer vision applications. However, their intricate architectures and the complexity of their underlying neural networks present significant challenges for efficient acceleration on conventional electronic platforms. Silicon photonics offers a compelling alternative. In this paper, we describe our recent efforts on developing a novel hardware accelerator that leverages silicon photonics to accelerate transformer neural networks integral to LLMs. Our evaluation demonstrates that the proposed accelerator delivers up to 14× higher throughput and 8× greater energy efficiency compared to leading-edge LLM hardware accelerators, including CPUs, GPUs, and TPUs.Item Open Access Invited paper: Bridging EDA and silicon photonics design: enabling robust-by-design photonic integrated circuits(Colorado State University. Libraries, 2025-03-04) Ghanaatian, Zahra, author; Mirza, Asif, author; Shafiee, Amin, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisherSilicon photonic devices are essential components of integrated optical communication systems and emerging photonic processors. However, their performance is notably impacted by fabrication-process variations (FPVs), which primarily stem from optical lithography imperfections. The impact of FPVs can accumulate and deteriorate the system-level performance through, for example, increasing system power consumption, accumulated crosstalk noise, and degrading signal integrity in photonic systems. In this paper, we discuss the promise of variation-aware design-space exploration and optimization to enhance photonic device robustness under different FPVs while considering two silicon photonic devices used widely in different applications, namely Microring Resonators (MRRs) and Mach-Zehnder Interferometers (MZIs). In addition, we consider a system-level case study of an MZI-based coherent neural network, where we show how our proposed variation-aware design optimization at the device level helps improve the network accuracy by up to 88% under FPVs.Item Open Access Lightator: an optical near-sensor accelerator with compressive acquisition enabling versatile image processing(Colorado State University. Libraries, 2024-06-23) Morsali, Mehrdad, author; Reidy, Brendan, author; Najafi, Deniz, author; Tabrizchi, Sepehr, author; Imani, Mohsen, author; Nikdast, Mahdi, author; Roohi, Arman, author; Zand, Ramtin, author; Angizi, Shaahin, author; ACM, publisherThis paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will substantially diminish the energy consumption and latency of conversion, transmission, and processing within the established cloud-centric architecture as well as recently designed edge accelerators. Our device-to-architecture simulation results show that with favorable accuracy, Lightator achieves 84.4 Kilo FPS/W and reduces power consumption by a factor of ~24× and 73× on average compared with existing photonic accelerators and GPU baseline.Item Open Access SCRIPT: a multi-objective routing framework for securing chiplet systems against distributed DoS attacks(Colorado State University. Libraries, 2024-06-12) Taheri, Ebadollah, author; Aghanoury, Pooya, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; Sehatbakhsh, Nader, author; ACM, publisherHeterogeneous 2.5D integration enables seamless integration of chiplets, hence reducing design time and costs. Concerns arise when dealing with untrustworthy chiplets, emphasizing the need for dependable Network-on-Interposer (NoI). This paper introduces SCRIPT, a secure routing framework to mitigate Distributed Denial-of-Service (DDoS) attacks in chiplet systems. SCRIPT obscures predictable paths exploited by attackers, disrupting orchestrated attacks. SCRIPT considers chiplet trust and criticality and employs a multi-objective optimization technique to enhance NoI performance and reliability. Evaluations show that SCRIPT enhances NoI security by at least 64% against DDoS attacks.Item Open Access Life-after-death: exploring thermal annealing conditions to enhance 3D NAND SSD endurance(Colorado State University. Libraries, 2024-07-08) Buddhanoy, Matchima, author; Pasricha, Sudeep, author; Ray, Biswajit, author; ACM, publisherIn this paper, we evaluate thermal annealing effects on the endurance of commercial off-the-shelf (COTS) 3D NAND flash memory beyond its end-of-life. We systematically evaluate the effects of anneal duration, anneal temperature, and state of the memory cells during annealing on the endurance enhancement. Interestingly, we find that endurance enhancement critically depends on the state of flash memory cells during annealing, with programmed cells showing significantly larger improvements than erased cells. Our experimental evaluation indicates that the post-cycle data retention property of an annealed chip significantly improves after thermal annealing, resulting in ∼30% endurance recovery. Our results have significant implications for the future wear-leveling algorithms of SSD-based storage systems.Item Open Access SerIOS: enhancing hardware security in integrated optoelectronic systems(Colorado State University. Libraries, 2024-06-21) Göhring de Magalhães, Felipe, author; Nikdast, Mahdi, author; Nicolescu, Gabriela, author; ACM, publisherSilicon photonics (SiPh) has different applications, from enabling fast and high-bandwidth communication for high-performance computing systems to realizing energy-efficient optical computation for AI hardware accelerators. However, integrating SiPh with electronic sub-systems can introduce new security vulnerabilities that cannot be adequately addressed using existing hardware security solutions for electronic systems. This paper introduces SerIOS, the first framework aimed at enhancing hardware security in optoelectronic systems by leveraging the unique properties of optical lithography. SerIOS employs cryptographic keys generated based on imperfections in the optical lithography process and an online detection mechanism to detect attacks. Simulation and synthesis results demonstrate SerIOS's effectiveness in detecting and preventing attacks, with a small area footprint of less than 15% and a 100% detection rate across various attack scenarios and optoelectronic architectures, including photonic AI accelerators.Item Open Access RISA: round-robin intra-rack friendly scheduling algorithm for disaggregated datacenters(Colorado State University. Libraries, 2023-11-12) Kabir, Rashadul, author; Kim, Ryan G., author; Nikdast, Mahdi, author; ACM, publisherRecent trends see a move away from a fixed-resource server-centric datacenter model to a more adaptable "disaggregated" datacenter model. These disaggregated datacenters can then dynamically group resources to the specific requirements of an incoming workload, thereby improving efficiency. To properly utilize these disaggregated datacenters, workload allocation techniques must examine the current state of the datacenter and choose resources that not only optimize the current workload request, but future ones. Since disaggregated datacenters are severely bottlenecked by the available network resources, our work proposes a heuristic-based approach called RISA, which significantly reduces the network usage of workload allocations in disaggregated datacenters. Compared to the state-of-the-art, RISA reduces the power consumption for optical components by 33% and reduces the average CPU-RAM round-trip latency by 50%. Additionally, RISA significantly outperforms the state-of-the-art in terms of execution time.Item Open Access SHIELD: sustainable hybrid evolutionary learning framework for carbon, wastewater, and energy-aware data center management(Colorado State University. Libraries, 2024-05-09) Qi, Sirui, author; Milojicic, Dejan, author; Bash, Cullen, author; Pasricha, Sudeep, author; ACM, publisherToday's cloud data centers are often distributed geographically to provide robust data services. But these geo-distributed data centers (GDDCs) have a significant associated environmental impact due to their increasing carbon emissions and water usage, which needs to be curtailed. Moreover, the energy costs of operating these data centers continue to rise. This paper proposes a novel framework to co-optimize carbon emissions, water footprint, and energy costs of GDDCs, using a hybrid workload management framework called SHIELD that integrates machine learning guided local search with a decomposition-based evolutionary algorithm. Our framework considers geographical factors and time-based differences in power generation/use, costs, and environmental impacts to intelligently manage workload distribution across GDDCs and data center operation. Experimental results show that SHIELD can realize 34.4× speedup and 2.1× improvement in Pareto Hypervolume while reducing the carbon footprint by up to 3.7×, water footprint by up to 1.8×, energy costs by up to 1.3×, and a cumulative improvement across all objectives (carbon, water, cost) of up to 4.8× compared to the state-of-the-art.Item Open Access Improving block management in 3D NAND flash SSDs with sub-block first write sequencing(Colorado State University. Libraries, 2024-06-12) Buddhanoy, Matchima, author; Khan, Kamil, author; Milenkovic, Aleksandar, author; Pasricha, Sudeep, author; Ray, Biswajit, author; ACM, publisherContinual vertical scaling in 3D NAND flash solid-state drives (SSDs) results in larger memory blocks, causing performance degradation due to big-block management issues. Pages within a 3D NAND flash block are traditionally written using layer first write sequencing. This paper introduces and explores the benefits of an alternative sub-block first write sequence. This method when coupled with sub-block erase operations promises to alleviate the big-block problem. Our evaluation on a commercial 32-layer 3D NAND flash SSD chip shows that though the proposed method increases the raw bit error rate (RBER), it remains below the threshold that can be corrected by error correction codes (ECCs). Simulation analysis further shows that our proposed method reduces garbage collection overhead, resulting in 36.0% lower response time and 9.6% reduction in additional writes due to garbage collection compared to traditional 3D NAND flash SSDs.Item Open Access Design space exploration for PCM-based photonic memory(Colorado State University. Libraries, 2023-06-05) Shafiee, Amin, author; Charbonnier, Benoit, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisherThe integration of silicon photonics (SiPh) and phase change materials (PCMs) has created a unique opportunity to realize adaptable and reconfigurable photonic systems. In particular, the nonvolatile programmability in PCMs has made them a promising candidate for implementing optical memory systems. In this paper, we describe the design of an optical memory cell based on PCMs while exploring the design space of the cell in terms of PCM material choice (e.g., GST, GSST, Sb2Se3), cell bit capacity, latency, and power consumption. Leveraging this design-space exploration for the design of efficient optical memory cells, we present the design and implementation of an optical memory array and explore its scalability and power consumption when using different optical memory cells. We also identify performance bottlenecks that need to be alleviated to further scale optical memory arrays with competitive latency and energy consumption, compared to their electronic counterparts.Item Open Access TRINE: a tree-based silicon photonic interposer network for energy-efficient 2.5D machine learning acceleration(Colorado State University. Libraries, 2023-10-28) Taheri, Ebadollah, author; Mahdian, Mohammad Amin, author; Pasricha, Sudeep, author; Nikdast, Mahdi, author; ACM, publisher2.5D chiplet systems have showcased low manufacturing costs and modular designs for machine learning (ML) acceleration. Nevertheless, communication challenges arise from chiplet interconnectivity and high-bandwidth demands among chiplets. To address these challenges, we present TRINE, a novel tree-based silicon photonic interposer network for energy-efficient ML acceleration. Leveraging silicon photonics and broadband optical switching, TRINE enables efficient inter-chiplet communication with reduced latency and improved energy efficiency. Considering several ML workloads, our simulation results demonstrate significant improvements in the average energy efficiency by 61.7% and 40% when comparing TRINE with two recently proposed silicon photonic interposer networks. By overcoming communication limitations in 2.5D ML accelerators, this work is a promising step towards advancing 2.5D photonic-based ML accelerator design.Item Open Access TRON: transformer neural network acceleration with non-coherent silicon photonics(Colorado State University. Libraries, 2023-06-05) Afifi, Salma, author; Sunny Febin, author; Nikdast, Mahdi, author; Pasricha, Sudeep, author; ACM, publisherTransformer neural networks are rapidly being integrated into state-of-the-art solutions for natural language processing (NLP) and computer vision. However, the complex structure of these models creates challenges for accelerating their execution on conventional electronic platforms. We propose the first silicon photonic hardware neural network accelerator called TRON for transformer-based models such as BERT, and Vision Transformers. Our analysis demonstrates that TRON exhibits at least 14× better throughput and 8× better energy efficiency, in comparison to state-of-the-art transformer accelerators.Item Open Access GHOST: a graph neural network accelerator using silicon photonics(Colorado State University. Libraries, 2023-09-09) Afifi, Salma, author; Sunny, Febin, author; Shafiee, Amin, author; Nikdast, Mahdi, author; Pasricha, Sudeep, author; ACM, publisherGraph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2 × better throughput and 3.8 × better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.
