
Qwen GSPO: Sequence-Level RL Stabilizes Large-Scale Language Model Training
GSPO introduces sequence-level optimization for RL training, stabilizing MoE models and eliminating infrastructure-heavy Routing Replay workarounds.
источник Qwen

GSPO introduces sequence-level optimization for RL training, stabilizing MoE models and eliminating infrastructure-heavy Routing Replay workarounds.