Presentation
DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs
DescriptionRecent parameter-efficient fine-tuning (PEFT) methods reduce trainable parameters while maintaining model performance, with Low-Rank Adaptation (LoRA) as a prominent approach. However, optimizing both accuracy and efficiency remains challenging.The Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction framework (DuQTTA) addresses the need for efficient fine-tuning of Large Language Models (LLMs) by employing Tensor-Train decomposition and dual-stage quantization to minimize model size and memory use.Additionally, adaptive optimization strategy and a decoupled update mechanism improve fine-tuning precision. DuQTTA consistently outperforms LoRA in fine-tuning LLaMA2-7B, LLaMA2-13B and LLaMA3-8B models across various tasks, achieving an several-fold compression rate improvement over LoRA.
Event Type
Research Manuscript
TimeMonday, June 231:45pm - 2:00pm PDT
Location3000, Level 3
AI
AI1: AI/ML Algorithms