Presentation
SwiftMax: Reducing Training Time for Learnable Softmax Alternative in Customized Acceleration
DescriptionSoftmax is a critical yet memory-intensive operation in the Self-Attention mechanism of Transformers. Previous methods fixed Softmax parameters to minimize computation but required retraining the entire model. We propose SwiftMax, a learnable Softmax alternative that reduce training time by employing layer-wise replacement and fine-tuning pre-trained models. SwiftMax reduces training time by up to 2,250× compared to retraining, while maintaining up to 97% of the original model's accuracy in most NLP tasks. Our approach achieves up to 23× performance improvement during inference on the AMD ACAP platform, alleviating the Softmax bottleneck in Self-Attention and enabling efficient hardware deployment without extensive retraining.
Event Type
Networking
Work-in-Progress Poster
TimeSunday, June 226:00pm - 7:00pm PDT
LocationLevel 3 Lobby