Presentation
Reconfigurable Vector Floating Point Accelerator on FPGAs
DescriptionWe propose an architecture for accelerating floating point operations through a novel reconfigurable vector floating point design. This includes support for multiple precisions, including the standard IEEE 754 Single (SP-32), Double (DP-64), Tensorfloat (TF-32), Bfloat (BF-16), and custom configurations like Quarter precision (QP-8). The architecture also introduces vector lane reconfiguration, allowing for efficient parallelization through packing and unpacking techniques. Each vector lane can be adjusted at runtime on an FPGA, providing flexibility in supporting different unrolling factors for loop optimization. The design is implemented on AMD-Xilinx ZCU104 FPGA and integrates DSPs, optimizing LUT usage, power consumption, and performance. For example, SP-32 with DSP achieves a 31.6% reduction in LUT usage, a 9.3% increase in operating frequency, and 24.7% lower power consumption. We recommend DSP usage at higher bit precisions. This flexibility in configuring precision levels allows for efficient utilization of FPGA resources and energy-efficient design, especially for higher precision operations. Incorporating this design into AI/ML workflows, signal processing, and scientific computing accelerates performance, providing a balance between throughput, power efficiency, and computational complexity.
Event Type
Engineering Presentation
TimeWednesday, June 2511:45am - 12:00pm PDT
Location2010, Level 2


