Close

Presentation

OutlierCIM: Outlier-Aware Digital CIM-Based LLM Accelerator with Hybrid-Strategy Quantization and Unified FP-INT Computation
DescriptionActivation outliers in Large Language Models (LLMs), which exhibit large magnitudes but small quantities, significantly affect model performance and pose challenges for the acceleration of LLMs. To address this bottleneck, researchers have proposed several co-design frameworks with outlier-aware algorithms and dedicated hardware. However, they face challenges balancing model accuracy with hardware efficiency when accelerating LLMs in a low bit-width manner. To this end, we propose OutlierCIM, the first algorithm and hardware co-design framework for compute-in-memory (CIM) accelerator with outlier-aware quantization algorithm. The key contributions of OutlierCIM are 1) an outlier-clustered tiling strategy that regulates memory access and reduces inefficient workloads which are both introduced by outliers, 2) a hybrid-strategy quantization and a reconfigurable double-bit CIM macro array that overcome the low storage utilization and high latency of outlier-based LLM quantization, and 3) a quantization factor post-processing strategy and a dedicated quantizer that efficiently unify the multiplication and accumulation of outlier-caused FP-INT workloads. Implemented in a 28nm CMOS technology, OutlierCIM occupies an area of 2.25 mm². When evaluated at comprehensive benchmarks, OutlierCIM achieves up to 4.54× energy efficiency improvement and 3.91× speedup compared to the state-of-the-art outlier-aware accelerators.
Event Type
Research Manuscript
TimeWednesday, June 254:45pm - 5:00pm PDT
Location3001, Level 3
Topics
Design
Tracks
DES2B: In-memory and Near-memory Computing Architectures, Applications and Systems