DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL
DeepScaleR shows how valualbe RL can be to make smalll models that are extremely good in a narrow area.
"""
TL;DR
RL magic is in the air! We introduce DeepScaleR-1.5B-Preview, a language model finetuned from Deepseek-R1-Distilled-Qwen-1.5B using simple reinforcement learning (RL). It achieves an impressive 43.1% Pass@1 accuracy on AIME2024 (+14.3% improvement over the base model), surpassing the performance of OpenAI’s o1-preview with just 1.5B parameters. We open sourced our dataset, code and training logs for everyone to progress on scaling intelligence with RL.
"""
5
1 comment
Anaxareian Aia
7
DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL
Data Alchemy
skool.com/data-alchemy
Your Community to Master the Fundamentals of Working with Data and AI — by Datalumina®
Leaderboard (30-day)
Powered by