Text Details
By dynamically updating the bias of each expert according to its recent load, Loss-Free Balancing can consistently maintain a balanced distribution of expert load. In addition, since Loss-Free Balancing does not produce any interference gradients, it also elevates the upper bound of model performance gained from MoE training.
—
Loss-Free Balancing
(other)
by AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
|
Language: | English |
This text has been typed
4 times:
Avg. speed: | 60 WPM |
---|---|
Avg. accuracy: | 96.2% |