DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

DeepScaleR shows how valualbe RL can be to make smalll models that are extremely good in a narrow area.

"""

TL;DR

RL magic is in the air! We introduce DeepScaleR-1.5B-Preview, a language model finetuned from Deepseek-R1-Distilled-Qwen-1.5B using simple reinforcement learning (RL). It achieves an impressive 43.1% Pass@1 accuracy on AIME2024 (+14.3% improvement over the base model), surpassing the performance of OpenAI’s o1-preview with just 1.5B parameters. We open sourced our dataset, code and training logs for everyone to progress on scaling intelligence with RL.

"""

https://github.com/agentica-project/deepscaler

1 comment