Presentation
DRAFT: Decoupling Backpropagation from Pre-trained Backbone for Efficient Transformer Fine-Tuning on Edge
DescriptionTransformers have demonstrated outstanding performance across diverse applications recently, necessitating fine-tuning to optimize their performance for downstream tasks. However, fine-tuning remains challenging due to the substantial computational costs and storage overhead of backpropagation (BP). The existing fine-tuning techniques require the BP through the massive pre-trained backbone weights for computing the input gradient, resulting in significant computing overhead and memory footprint for resource-constrained edge devices. To address the challenge, this work proposes an algorithm-hardware co-design framework, DRAFT, for efficient Transformer fine-tuning by decoupling the BP from the backbone weights, thereby efficiently reducing the BP overhead. The framework employs Feedback Decoupling Approximation (\textit{FDA}), an efficient fine-tuning algorithm that decouples BP into two low-complexity pathways: trainable adapter pathway and sparse ternary Bypass Network (\textit{BPN}) pathway. The two pathways work collaboratively to approximate the conventional BP process. Further, a DRAFT accelerator is proposed, featuring a reconfigurable design with lightweight sparse gather networks and dynamic workflows to fully harness the sparsity and data parallelism inherent to the FDA. Experimental results demonstrate that DRAFT achieves a speedup of 4.9× and an energy efficiency improvement of 4.2× on average compared to baseline fine-tuning methods across multiple fine-tuning tasks with negligible accuracy loss.
Event Type
Research Manuscript
TimeMonday, June 231:45pm - 2:00pm PDT
Location3001, Level 3
AI
AI3: AI/ML Architecture Design


