Presentation
BirdMoE: Reducing Communication Costs for Mixture-of-Experts Training Using Load-Aware Bi-random Quantization
DescriptionIn this paper, we propose BirdMoE, a load-aware communication compression technique with Bi-random quantization for MoE training. Specifically, BirdMoE employs a lightweight Random Quantization with expectation invariance property to efficiently map the floating-point intermediate computing results into integers while maintaining the MoE training quality. Additionally, BirdMoE utilizes a Mixed-Precision strategy to balance the communication loads among expert nodes, significantly improving all-to-all communication efficiency for MoE training system.
Experiments on typical MoE training tasks demonstrate that BirdMoE achieves higher 3.98x-10.44x total communication compression ratios and 1.18x-5.27x training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training quality.
Experiments on typical MoE training tasks demonstrate that BirdMoE achieves higher 3.98x-10.44x total communication compression ratios and 1.18x-5.27x training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training quality.
Event Type
Research Manuscript
TimeTuesday, June 242:45pm - 3:00pm PDT
Location3000, Level 3
AI
AI1: AI/ML Algorithms
Similar Presentations


