Presentation
Finding the Pareto Frontier of Low-Precision Data Formats and MAC Architecture for LLM Inference
DescriptionTo accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space.
This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and Vector-Scaled Quantization (VSQ) block data formats.
We evaluate the area, power, and numerical accuracy of >25,000 MAC designs spanning each data format and several key design parameters.
We find that pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8x, 2.2x, and 1.9x TOPS/W improvement compared to FP16, FP8, and FP4 implementations.
This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and Vector-Scaled Quantization (VSQ) block data formats.
We evaluate the area, power, and numerical accuracy of >25,000 MAC designs spanning each data format and several key design parameters.
We find that pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8x, 2.2x, and 1.9x TOPS/W improvement compared to FP16, FP8, and FP4 implementations.
Event Type
Research Manuscript
TimeMonday, June 232:45pm - 3:00pm PDT
Location3001, Level 3
AI
AI3: AI/ML Architecture Design