

论文标题:Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model 论文地址:https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/ORZ_paper.pdf 项目地址:https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero Hugging Face:https://huggingface.co/Open-Reasoner-Zero










如图 6 所示,可以看到整个训练过程中响应长度持续增加,没有饱和迹象,类似于 DeepSeek-R1-Zero 中看到的行为。


