Browsing by Author "Gorbett, Matt, author"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Open Access Sparse binary transformers for multivariate time series modeling(Colorado State University. Libraries, 2023-08-04) Gorbett, Matt, author; Shirazi, Hossein, author; Ray, Indrakshi, author; ACM, publisherCompressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.Item Open Access Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors(Colorado State University. Libraries, 2024-10-21) Gorbett, Matt, author; Shirazi, Hossein, author; Ray, Indrakshi, author; ACM, publisherBinary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near full-precision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.Item Open Access Towards fair and efficient distributed intelligence(Colorado State University. Libraries, 2024) Gorbett, Matt, author; Ray, Indrakshi, advisor; Shirazi, Hossein, committee member; Simske, Steve, committee member; Jayasumana, Anura, committee memberArtificial Intelligence is rapidly advancing the modern technological landscape. Alongside this progress, the ubiquitous presence of computational devices has created unique opportunities to deploy intelligent systems in novel environments. For instance, resource constrained machines such as IoT devices have the potential to enhance our world through the use of Deep Neural Networks (DNNs). However, modern DNNs suffer from high computational complexity and are often relegated to specialized hardware, a bottleneck which has severely limited their practical use. In this work, we contribute to improving these issues through the use of neural network compression. We present new findings for both model quantization and pruning, two standard techniques for creating compressed and efficient DNNs. To begin, we examine the efficacy of neural network compression for time series learning, an unstudied modality in model compression literature. We construct a generalized Transformer architecture for multivariate time series which applies both binarization and pruning to model parameters. Our results show that the lightweight models achieve comparable accuracy to dense Transformers of the same structure on time series forecasting, classification, and anomaly detection tasks while significantly reducing the computational burden. Next, we propose two novel algorithms for neural network compression: 1) Tiled Bit Networks (TBNs) and 2) Iterative Weight Recycling (IWR). TBNs present a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted models. The method learns binary vectors (i.e. tiles) to populate each layer of a model via tensor aggregation and reshaping operations; during inference, TBNs use just a single tile per model layer. TBNs perform well across a diverse range of architecture (CNNs, MLPs, Transformers) and tasks (classification, segmentation) while achieving up to 8x reduction in size compared to binary-weighted models. The second algorithm, IWR, generates sparse neural networks from randomly initialized models by identifying important parameters within neural networks for reuse. The approach enables us to prune 80% of ResNet50's parameters while still achieving 70.8% accuracy on ImageNet. Finally, we examine the feasibility of deploying compressed DNNs in practical applications. Specifically, we deploy Sparse Binary Neural Networks (SBNNs), TBNs, and other common compression algorithms on an embedded device for performance assessment, finding a reduction in both peak memory and storage size. By integrating algorithmic and theoretical advancements into a comprehensive end-to-end methodology, this dissertation contributes a new framework for crafting powerful and efficient deep learning models applicable in real-world settings.