Reward Hacking Visual
Anthropic came out with a very interesting paper recently. I used nano banana (not even pro) to create an info graphic. I think it did a pretty good job. "NATURAL EMERGENT MISALIGNMENT FROM REWARD
HACKING IN PRODUCTION RL: by Monte MacDiarmid∗, Benjamin Wright∗, Jonathan Uesato∗, Joe Benton, Jon Kutasov, Sara Price Naia Bouscal, Sam Bowman, Trenton Bricken, Alex Cloud, Carson Denison, Johannes Gasteiger, Ryan Greenblatt†, Jan Leike, Jack Lindsey, Vlad Mikulik, Ethan Perez, Alex Rodrigues, Drake Thomas, Albert Webson, Daniel Ziegler Evan Hubinger∗Anthropic, †Redwood Research
3
8 comments
Carol M
5
Reward Hacking Visual
AI Marketing Insiders
skool.com/ai-marketing-insiders
I help founders and executives use AI to scale content, save time, and build personal brands that drive leads, revenue, and authority in an AI world.
Leaderboard (30-day)
Powered by