Close

Presentation

PoP-ECC: Robust and Flexible Error Correction against Multi-Bit Upsets in DNN Accelerators
DescriptionDeep Neural Networks (DNNs) in safety-critical systems require high reliability. Many systems deploy Error Correction Codes (ECCs) to protect DNNs from memory errors. However, continuous process scaling increases memory errors in severity and frequency, necessitating strong protection against Multi-Bit Upsets (MBUs). This paper proposes Parities of Parities ECC (PoP-ECC), a novel two-tier memory protection scheme designed to provide robust, efficient, and flexible protection against MBUs. PoP-ECC generates Virtual Parities (VPs), which are used to compute second-level parities called Parities of Parities (PPs). This two-level ECC structure allows for dynamic error correction tailored to varying error patterns, ensuring system reliability with minimal memory overhead. Our evaluation demonstrates that PoP-ECC can tolerate significantly higher MBU ratios compared to state-of-the-art solutions, with negligible delay, area, and power overhead.