Thivagara Sarma, Janarthanan, authorPouchet, Louis-Noël, advisorRajopadhye, Sanjay, committee memberPasricha, Sudeep, committee memberAnderson, Chuck, committee member2020-06-222020-06-222020https://hdl.handle.net/10217/208485Deep neural networks are computational and memory intensive applications. Many network pruning and compression solutions has been introduced to deploy inference of large trained models in limited memory and time critical systems. We proposed a new pruning methodology that assigns significance rank to the operations in the inference program and for a given capacity and operation budget, generate only the important operations to do the inference. Our approach has shown that, in many classical feed forward classification networks we can maintain almost the same accuracy as the original inference by executing less than half of the operations in the original program. We also proposed a methodology to improve the effective implementation of the output sparse computation, controllable by a threshold variable.born digitalmasters thesesengCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.compressionpruningaccelerationSIMDdeep neural networksPruning and acceleration of deep neural networksText