Presentation
Enhancing Parallelism and Energy-Efficiency in SOT-MRAM based CIM Architecture for On-Chip Learning
DescriptionComputational-In-Memory (CIM) architectures have emerged as energy-efficient solutions for Artificial
Intelligence (AI) applications, enabling data processing within memory arrays and reducing the bottleneck associated with data transfer. The rapid advancement of AI demands real-time on-chip learning but implementing this with CIM architectures poses significant challenges, such as limited parallelism and energy-efficiency during training and inference. In this paper, we propose a novel CIM architecture specifically designed for on-chip learning applications, which capitalizes on the unique properties of Spin Orbit Torque (SOT) technology to enhance both parallelism and energy-efficiency in computation. The proposed architecture incorporates a bulk-write mechanism for SOT-cell based arrays, enabling efficient weight updates during on-chip training. Additionally, we develop a scheme to process vector elements concurrently for vector-matrix multiplications during inference. To achieve this, we design multi-port bit-cell access capabilities along with their associated control mechanisms. Simulation results show a 5.82× reduction in latency and a 3.20× improvement in energy-efficiency compared to standard SOT-MRAM based CIM, with negligible overhead.
Intelligence (AI) applications, enabling data processing within memory arrays and reducing the bottleneck associated with data transfer. The rapid advancement of AI demands real-time on-chip learning but implementing this with CIM architectures poses significant challenges, such as limited parallelism and energy-efficiency during training and inference. In this paper, we propose a novel CIM architecture specifically designed for on-chip learning applications, which capitalizes on the unique properties of Spin Orbit Torque (SOT) technology to enhance both parallelism and energy-efficiency in computation. The proposed architecture incorporates a bulk-write mechanism for SOT-cell based arrays, enabling efficient weight updates during on-chip training. Additionally, we develop a scheme to process vector elements concurrently for vector-matrix multiplications during inference. To achieve this, we design multi-port bit-cell access capabilities along with their associated control mechanisms. Simulation results show a 5.82× reduction in latency and a 3.20× improvement in energy-efficiency compared to standard SOT-MRAM based CIM, with negligible overhead.
Event Type
Research Manuscript
TimeMonday, June 234:15pm - 4:30pm PDT
Location3001, Level 3
Design
DES5: Emerging Device and Interconnect Technologies