Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors

Gorbett, Matt, author; Shirazi, Hossein, author; Ray, Indrakshi, author; ACM, publisher

doi:https://doi.org/10.1145/3627673.3679603

Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors

Files

FACF_ACMOA_3627673.3679603.pdf (1.53 MB)

Date

2024-10-21

Authors

Gorbett, Matt, author

Shirazi, Hossein, author

Ray, Indrakshi, author

ACM, publisher

Abstract

Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near full-precision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.

Subject

neural network quantization

compression

efficiency

on-device machine learning

edge machine learning

IoT

URI

https://hdl.handle.net/10217/239540

Collections

Publications

Full item page

Tiled bit networks: sub-bit neural network compression through reuse of learnable binary vectors

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By