Presentation
RADiT: Redundancy-Aware Diffusion Transformer Acceleration Leveraging Timestep Similarity
DescriptionDiffusion Transformers (DiTs) have demonstrated unprecedented performance across various generative tasks including image and video generation.
However, a large amount of computations on the inference process and iterative sampling steps in the DiT models result in high computational costs, leading to substantial latency and energy consumption challenges.
To address these issues, we propose a redundancy-aware DiT (RADiT), a novel software-hardware co-optimization accelerator for DiTs that minimizes redundant operations in the iterative sampling stages.
We identify data redundancy by evaluating blockwise input features and skip redundant computations by reusing results from consecutive timesteps.
Furthermore, to minimize accuracy degradation and maximize computational efficiency, the Dynamic Threshold Scaling Module (DTSM) and Compress and Compare Unit (CCU) are employed in the redundancy detection process.
This approach enables DiTs to achieve up to 1.8x and 1.7x faster speeds for image and video generation, respectively, without compromising quality, along with 41% and 45.5% reductions in energy consumption.
Our RADiT scheme improves throughput by 1.67x and 1.76x for image and video generation tasks, respectively, while maintaining output quality and significantly reducing energy consumption.
However, a large amount of computations on the inference process and iterative sampling steps in the DiT models result in high computational costs, leading to substantial latency and energy consumption challenges.
To address these issues, we propose a redundancy-aware DiT (RADiT), a novel software-hardware co-optimization accelerator for DiTs that minimizes redundant operations in the iterative sampling stages.
We identify data redundancy by evaluating blockwise input features and skip redundant computations by reusing results from consecutive timesteps.
Furthermore, to minimize accuracy degradation and maximize computational efficiency, the Dynamic Threshold Scaling Module (DTSM) and Compress and Compare Unit (CCU) are employed in the redundancy detection process.
This approach enables DiTs to achieve up to 1.8x and 1.7x faster speeds for image and video generation, respectively, without compromising quality, along with 41% and 45.5% reductions in energy consumption.
Our RADiT scheme improves throughput by 1.67x and 1.76x for image and video generation tasks, respectively, while maintaining output quality and significantly reducing energy consumption.
Event Type
Research Manuscript
TimeTuesday, June 244:15pm - 4:30pm PDT
Location3000, Level 3
AI
AI3: AI/ML Architecture Design
Similar Presentations


