Novel tensor norm optimization for neural network training acceleration

Banik, Mridul, author; ACM, publisher

doi:https://doi.org/10.1145/ 3774791.3774805

Novel tensor norm optimization for neural network training acceleration

Files

FACF_ACMOA_3774791.3774805.pdf (818.91 KB)

Date

2025-12-09

Authors

Banik, Mridul, author

ACM, publisher

Abstract

This paper introduces an advanced optimization algorithm designed to enhance the training efficiency of neural networks, particularly focusing on the intricate weight matrices prevalent in large language models. Diverging from prior spectral norm-based approaches, our method leverages the nuclear norm to formulate a novel update rule, yielding a distinct optimization technique called Neon. We provide rigorous theoretical guarantees concerning its convergence properties through convex optimization and Karush-Kuhn-Tucker conditions. Performance evaluations across multilayer perceptrons, convolutional neural networks, and generative models such as NanoGPT demonstrate computational advantages over existing optimizers including Muon and AdamW. The Frobenius-based Neon variant achieves comparable or superior convergence while maintaining significantly lower per-iteration overhead of O(mn) FLOPs compared to Muon's O(mn · min {m, n}) for m x n matrices. This work advances more robust and faster training methodologies for complex AI systems.

Subject

neural network optimization

nuclear norm

low-rank updates

gradient descent

deep learning

URI

https://hdl.handle.net/10217/242556

Collections

Publications

Full item page

Novel tensor norm optimization for neural network training acceleration

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By