Presentation
MPE : A Power-Efficient Edge-Device Mamba Processor with Multi-Dimensional Calculation-Compression Scheme
DescriptionAs one of the most representative AI technologies, the Mamba architecture has enabled many advanced models. This paper proposes an energy-efficient Mamba inference processor, called the Mamba Processing Element (MPE). Firstly, MPE uses the recurrent framework to find Low-correlation Assignment Pruning Optimization (LAPO) schemes; Secondly, MTPE uses the mechanism of Spatial Multi-head Attention Similarity (SMAS); Thirdly, MPE designs a Dynamic Parallel Compression Quantization (DPCQ) architecture. Using 28nm CMOS synthesis tools, the proposed STPE processor has an area of 9.14 mm2 and a peak energy efficiency of 93.51TOPS/W, which is 16.3 times that of the H100 graphics processing unit (GPU).
Event Type
Networking
Work-in-Progress Poster
TimeSunday, June 226:00pm - 7:00pm PDT
LocationLevel 3 Lobby