Presentation
HeteroSVD: Efficient SVD Accelerator on Versal ACAP with Algorithm-Hardware Co-Design
DescriptionSingular value decomposition (SVD) is a matrix factorization technique widely used in signal processing and recommendation systems, etc. In general, the time complexity of SVD algorithms is cubic to
the problem size, making SVD algorithms difficult to meet stringent performance requirements in real-time. However, existing FPGA and GPU solutions fall short of jointly optimizing latency, throughput, and
power consumption. To settle this issue, this paper proposes HeteroSVD, a heterogeneous reconfigurable accelerator for SVD computation on the Versal ACAP platform. HeteroSVD introduces a system-level SVD decomposition mechanism and proposes an algorithm-hardware co-design
method to jointly optimize SVD ordering and AI engine (AIE)-centric dataflow and placement with Versal. Furthermore, in order to improve the quality of results (QoR) and facilitate micro-architecture selection, we introduce an automatic optimization framework that performs accurate
performance modeling and fast design space exploration. Experiment results demonstrate that HeteroSVD reduces the latency by 1.98× over existing FPGA accelerators and outperforms GPU solutions with an improvement of up to 7.22× in latency, 1.77× in throughput, and 13.18× in energy efficiency.
the problem size, making SVD algorithms difficult to meet stringent performance requirements in real-time. However, existing FPGA and GPU solutions fall short of jointly optimizing latency, throughput, and
power consumption. To settle this issue, this paper proposes HeteroSVD, a heterogeneous reconfigurable accelerator for SVD computation on the Versal ACAP platform. HeteroSVD introduces a system-level SVD decomposition mechanism and proposes an algorithm-hardware co-design
method to jointly optimize SVD ordering and AI engine (AIE)-centric dataflow and placement with Versal. Furthermore, in order to improve the quality of results (QoR) and facilitate micro-architecture selection, we introduce an automatic optimization framework that performs accurate
performance modeling and fast design space exploration. Experiment results demonstrate that HeteroSVD reduces the latency by 1.98× over existing FPGA accelerators and outperforms GPU solutions with an improvement of up to 7.22× in latency, 1.77× in throughput, and 13.18× in energy efficiency.
Event Type
Research Manuscript
TimeTuesday, June 243:30pm - 3:45pm PDT
Location3002, Level 3
Similar Presentations


