Close

Presentation

YOCO: A Hybrid In-Memory Computing Architecture with 8-bit Sub-PetaOps/W In-Situ Multiply Arithmetic for Large-Scale AI
DescriptionIn this paper, we further explore the potential of analog in-memory computing (AiMC) and introduce an innovative artificial intelligence (AI) accelerator architecture named YOCO, featuring three key proposals: (1) YOCO proposes an innovative 8-bit in-situ multiply arithmetic (IMA) achieving 123.8 TOPS/W energy-efficiency and 34.9 TOPS throughput through efficient charge-domain computation and time-domain accumulation mechanism. (2) YOCO employs a hybrid ReRAM-SRAM memory structure to balance computational efficiency and storage density. (3) YOCO tailors an IMC-friendly attention computing flow with an efficient pipe-line to accelerate the inference of transformer-based AI models. Compared to three SOTA baselines, YOCO on average improves energy efficiency by up to 3.9×~19.9× and through-put by up to 6.8×~33.6× across 10 CNN/transformer models.