Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors

Gorbett, Matt, author; Shirazi, Hossein, author; Ray, Indrakshi, author; ACM, publisher

Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors

dc.contributor.author	Gorbett, Matt, author
dc.contributor.author	Shirazi, Hossein, author
dc.contributor.author	Ray, Indrakshi, author
dc.contributor.author	ACM, publisher
dc.date.accessioned	2024-11-11T19:34:34Z
dc.date.available	2024-11-11T19:34:34Z
dc.date.issued	2024-10-21
dc.description.abstract	Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near full-precision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.
dc.format.medium	born digital
dc.format.medium	articles
dc.identifier.bibliographicCitation	Matt Gorbett, Hossein Shirazi, and Indrakshi Ray. 2024. Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24), October 21–25, 2024, Boise, ID, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3627673.3679603
dc.identifier.doi	https://doi.org/10.1145/3627673.3679603
dc.identifier.uri	https://hdl.handle.net/10217/239540
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	Publications
dc.relation.ispartof	ACM DL Digital Library
dc.rights	©Matt Gorbett, et al. ACM 2024. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in CIKM '24, https://dx.doi.org/10.1145/3627673.3679603.
dc.subject	neural network quantization
dc.subject	compression
dc.subject	efficiency
dc.subject	on-device machine learning
dc.subject	edge machine learning
dc.subject	IoT
dc.title	Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors
dc.type	Text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: FACF_ACMOA_3627673.3679603.pdf
Size:: 1.53 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications