Pruning visual transformers to increase model compression and decrease inference time

Yost, James E., authorWhitley, Darrell, advisorGhosh, Sudipto, committee memberBetten, Anton, committee member2024-05-272024-05-272024https://hdl.handle.net/10217/238378We investigate the efficacy of pruning a visual transformer during training to reduce inference time while maintaining accuracy. Various training techniques were explored, including epoch-based training, fixed-time training, and training to achieve a specific accuracy threshold. Results indicate that pruning from the inception of training offers significant reductions in inference time without sacrificing model accuracy. Different pruning rates were evaluated, demonstrating a trade-off between training speed and model compression. Slower pruning rates allowed for better convergence to higher accuracy levels and more efficient model recovery. Furthermore, we examine the cost of pruning and the recovery time of pruned models. Overall, the findings suggest that early-stage pruning strategies can effectively produce smaller, more efficient models with comparable or improved performance compared to non-pruned counterparts, offering insights into optimizing model efficiency and resource utilization in AI applications.born digitalmasters thesesengCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.machine learningtransformerpruningartificial intelligencePruning visual transformers to increase model compression and decrease inference timeText