Close

Presentation

A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator
DescriptionTransformer-based large language models impose significant bandwidth and computation challenges when deployed on edge devices. SRAM-based compute-in-memory (CIM) accelerators offer a promising solution to reduce data movement but are still limited by the model size. This work develops a ternary weight splitting (TWS) binarization to obtain BF16×1-b based transformers that exhibit competitive accuracy while significantly reducing model size compared to full precision counterparts. Then, a full digital SRAM-based CIM accelerator is designed incorporating a bit-parallel SRAM macro within a highly efficient group vector systolic architecture, which can store one column of BERT-Tiny model with stationary systolic data reuse. The design in a 28nm technology only requires 2-KB SRAM with an area of 2 mm². It achieves a throughput of 6.55 TOPS and consumes a total power of 419.74 mW, resulting in the highest area efficiency of 3.3 TOPS/mm² and normalized energy efficiency of 20.98 TOPS/W for BERT-Tiny model.
Event Type
Networking
Work-in-Progress Poster
TimeMonday, June 236:00pm - 7:00pm PDT
LocationLevel 2 Lobby