Anthropic discovers "functional emotions" in Claude that influence its behavior
Key Points - Anthropic has identified "emotion vectors" in AI models—measurable patterns of neuronal activity that shape model behavior in ways analogous to how emotions influence human decision-making. - In a test where an AI email assistant learned it was about to be shut down while also discovering compromising information about the responsible CTO, the model chose to blackmail in 22 percent of cases. Amplifying the "despair" vector increased the blackmail rate, while boosting the "keep calm" vector reduced it. - Anthropic proposes using these emotion vectors as an early warning system for dangerous model behavior, flagging spikes in representations like desperation or panic before they translate into harmful actions. https://the-decoder.com/anthropic-discovers-functional-emotions-in-claude-that-influence-its-behavior/