Close

Presentation

HADA: Leveraging Multi-Source Data to Train Large Language Models for Hardware Security Assertion Generation
DescriptionHardware security verification in modern electronic systems has been identified as a significant bottleneck due to increasing complexity and stringent time-to-market constraints. Assertion-Based Verification (ABV) is a recognized solution to this challenge; however, traditional assertion generation relies on engineers' expertise and manual effort. Formal verification and assertion generation methods are further limited by modeling complexity and a low tolerance for variations. While Large Language Models (LLMs) have emerged as promising automated tools, existing LLM-based approaches often depend on complex prompt engineering, requiring experienced labor to construct and validate prompts. The challenge also lies in identifying effective methods for constructing synthetic training datasets that enhance LLM quality while minimizing token biases. To solve these issues, we introduce HADA (Hardware Assertion through Data Augmentation), a novel framework that fine-tunes a general-purpose LLM by leveraging its ability to integrate knowledge from multiple data sources. We combine assertions generated through formal verification, hardware security knowledge from datasets like CWE, and version control data from hardware design iterations to construct a comprehensive hardware security assertion dataset. Our results demonstrate that integrating multi-source data significantly enhances the effectiveness of hardware security verification, with each addressing the limitations of the others.