Close

Presentation

A High-Precision and Low-Cost Approximate Transform Accelerator for Video Coding
DescriptionThe introduction of multiple transform types into the Versatile Video Coding (VVC) standard has yielded notable encoding gains but also imposed considerable computational burdens. Existing transform circuits of different types are typically implemented separately due to their independence, leading to substantial hardware overhead. To address this, we explore the relationship between Discrete Cosine Transform Type-2 (DCT2) and Discrete Sine Transform Type-7 (DST7) matrices and reveal a prominent diagonal aggregation phenomenon in the elements of the transfer matrix. Based on this insight, the least-squares method is applied to optimize the transfer matrix sparsity, achieving a high-precision, low-cost approximate conversion from DCT2 to DST7. Furthermore, we optimize DCT2 computation by proposing an elaborate matrix decomposition approach that allows a lightweight shift-adder unit to efficiently generate all required product terms across varying sizes. Leveraging these algorithmic optimizations, we implement a highly reusable and area-efficient approximate transform accelerator that supports sizes from 4 to 32 points and accommodates three types in VVC. Experimental results demonstrate that the proposed accelerator achieves over 44% reduction in circuit resource consumption with minimal BD-BR performance loss of just 0.57%, maintaining processing capabilities up to 8K@57 fps.