Repository logo

Pruning visual transformers to increase model compression and decrease inference time


We investigate the efficacy of pruning a visual transformer during training to reduce inference time while maintaining accuracy. Various training techniques were explored, including epoch-based training, fixed-time training, and training to achieve a specific accuracy threshold. Results indicate that pruning from the inception of training offers significant reductions in inference time without sacrificing model accuracy. Different pruning rates were evaluated, demonstrating a trade-off between training speed and model compression. Slower pruning rates allowed for better convergence to higher accuracy levels and more efficient model recovery. Furthermore, we examine the cost of pruning and the recovery time of pruned models. Overall, the findings suggest that early-stage pruning strategies can effectively produce smaller, more efficient models with comparable or improved performance compared to non-pruned counterparts, offering insights into optimizing model efficiency and resource utilization in AI applications.


Rights Access


machine learning
artificial intelligence


Associated Publications