Pruning and acceleration of deep neural networks

Thivagara Sarma, Janarthanan, author; Pouchet, Louis-Noël, advisor; Rajopadhye, Sanjay, committee member; Pasricha, Sudeep, committee member; Anderson, Chuck, committee member

Pruning and acceleration of deep neural networks

Files

ThivagaraSarma_colostate_0053N_16017.pdf (559.32 KB)

Date

2020

Authors

Thivagara Sarma, Janarthanan, author

Pouchet, Louis-Noël, advisor

Rajopadhye, Sanjay, committee member

Pasricha, Sudeep, committee member

Anderson, Chuck, committee member

Abstract

Deep neural networks are computational and memory intensive applications. Many network pruning and compression solutions has been introduced to deploy inference of large trained models in limited memory and time critical systems. We proposed a new pruning methodology that assigns significance rank to the operations in the inference program and for a given capacity and operation budget, generate only the important operations to do the inference. Our approach has shown that, in many classical feed forward classification networks we can maintain almost the same accuracy as the original inference by executing less than half of the operations in the original program. We also proposed a methodology to improve the effective implementation of the output sparse computation, controllable by a threshold variable.

Subject

compression

pruning

acceleration

SIMD

deep neural networks

URI

https://hdl.handle.net/10217/208485

Collections

2020-
Theses and Dissertations

Full item page

Pruning and acceleration of deep neural networks

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections