Mixed-precision S/DGEMM using the TF32 and TF64 frameworks on low-precision AI tensor cores

Valero-Lara, Pedro, author; Liu, Frank, autor; Vetter, Jeffrey S., author; Jorquera, Ian, author; ACM, publisher

Mixed-precision S/DGEMM using the TF32 and TF64 frameworks on low-precision AI tensor cores

dc.contributor.author	Valero-Lara, Pedro, author
dc.contributor.author	Liu, Frank, autor
dc.contributor.author	Vetter, Jeffrey S., author
dc.contributor.author	Jorquera, Ian, author
dc.contributor.author	ACM, publisher
dc.date.accessioned	2024-11-11T19:30:35Z
dc.date.available	2024-11-11T19:30:35Z
dc.date.issued	2023-11-12
dc.description.abstract	Using NVIDIA graphics processing units (GPUs) equipped with Tensor Cores has enabled the significant acceleration of general matrix multiplication (GEMM) for applications in machine learning (ML) and artificial intelligence (AI) and in high-performance computing (HPC) generally. The use of such power-efficient, specialized accelerators can provide a performance increase between 8× and 20×, albeit with a loss in precision. However, a high level of precision is required in many large scientific and HPC applications, and computing in single or double precision is still necessary for many of these applications to maintain accuracy. Fortunately, mixed-precision methods can be employed to maintain a higher level of numerical precision while also taking advantage of the performance increases from computing with lower-precision AI cores. With this in mind, we extend the state of the art by using NVIDIA's new TF32 framework. This new framework not only burdens some constraints of the previous frameworks, such as costly 32 16-bit castings but also provides an equivalent precision and performance by using a much simpler approach. We also propose a new framework called TF64 that attempts double-precision arithmetic with low-precision Tensor Cores. Although this framework does not exist yet, we validated the correctness of this idea and achieved an equivalent of 64-bit precision on 32-bit hardware.
dc.format.medium	born digital
dc.format.medium	articles
dc.identifier.bibliographicCitation	Pedro Valero-Lara, Frank Liu, and Jeffrey S. Vetter and Ian Jorquera. 2023. Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low- Precision AI Tensor Cores. In Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023), November 12–17, 2023, Denver, CO, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3624062.3624084
dc.identifier.doi	https://doi.org/10.1145/3624062.3624084
dc.identifier.uri	https://hdl.handle.net/10217/239514
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	Publications
dc.relation.ispartof	ACM DL Digital Library
dc.rights	©Pedro, Valero-Lara, et al. ACM 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SC-W 2023, https://dx.doi.org/10.1145/3624062.3624084 .
dc.subject	mixed precision
dc.subject	tensor core
dc.subject	GEMM
dc.subject	GPUs
dc.title	Mixed-precision S/DGEMM using the TF32 and TF64 frameworks on low-precision AI tensor cores
dc.type	Text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: FACF_ACMOA_3624062.3624084.pdf
Size:: 1.73 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications