Close

Presentation

Harnessing Conventional Video Processing Insights for Emerging 3D Video Generation Models: A Comprehensive Attention-aware Way
DescriptionVideo Generation Models based on 3D full attention (3D-VGMs) have significantly enhanced video quality. However, their inference overhead remains substantial, primarily due to the high computational cost of the attention mechanism, which accounts for over 75% of computations. Inspired by the success of conventional video processing, where video compression exploits similarities among patches, we point out that the attention mechanism can also harness the benefits from similarities among tokens. Nonetheless, two critical problems arise: (1) How can similarities be efficiently acquired in real-time? (2) How can workload balance be maintained when similar tokens are randomly distributed?

To address these problems and leverage similarities for 3D-VGMs, we propose SIMPICKER, a comprehensive attention-aware algorithm-hardware co-design for 3D-VGMs. Our core methodology is to fully utilize similarities in attention through both coarse-grained and fine-grained approaches while adopting dynamic adaptive strategies to leverage them. From the algorithm perspective, we propose a speculation-based similarity exploitation algorithm, allowing real-time importance speculation on the frame level, which is coarse-grained, and token level, which is fine-grained. From the micro-architecture perspective, we propose a buffered lookup table-based (LUT-based) multiplication architecture for FP-INT multiplication and further eliminate potential bank conflicts to accelerate unimportant attention computation. From the mapping perspective, we propose an adaptive grouping strategy in speculation to tame workload imbalance caused by randomly distributed similar tokens and allow seamless integration of our algorithms. Extensive experiments show that SIMPICKER achieves an average of 5.21×, 1.45× speedup and 17.92×, 1.63× energy efficiency compared to the NVIDIA A100 GPU and the state-of-the-art accelerators.