Deep-Agent / R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 We
"""
R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3
  1. We firstly reveal that Reinforcement Learning with Verifiable Rewards (RLVR) outperforms chain-of-thought supervised fine-tuning (CoT-SFT) in both effectiveness and out-of-distribution (OOD) robustness for vision language models.
  2. In our experiment, we incentivize VLMs to learn generalizable visual counting abilities, rather than overfitting to the training set.
  3. The 2B model outperforms the 72B model in OOD tests within just 100 training steps.
  4. The training was conducted on 8 A100 GPUs for 30 minutes, costing $2.62.
  5. Codes, models, datasets, more details and all open-source resources will be shared (within CNY holidays).
"""
1
1 comment
Anaxareian Aia
7
Deep-Agent / R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 We
Data Alchemy
skool.com/data-alchemy
Your Community to Master the Fundamentals of Working with Data and AI — by Datalumina®
Leaderboard (30-day)
Powered by