Text Details
We validate the performance of Loss-Free Balancing on MoE models with up to 3B parameters trained on up to 200B tokens. Experimental results show that Loss-Free Balancing achieves both better performance and better load balance compared with traditional auxiliary-loss-controlled load balancing strategies.
—
Loss-Free Balancing
(other)
by AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
|
Language: | English |
This text has been typed
17 times:
Avg. speed: | 87 WPM |
---|---|
Avg. accuracy: | 96.9% |