Anthropic just released a brand new constitution for Claude that moves away from simple rules to help the AI understand the actual "why" behind its behavior.
- Moving from 2023 standalone principles to a holistic document explaining the context and reasons for its values.
- Prioritizes 4 core properties in order starting with safety followed by ethics, compliance, and helpfulness.
- Includes hard constraints to prevent high stakes risks like providing uplift for bioweapons attacks.
- Uses synthetic training data created by Claude itself to help future versions learn these new values.
- Released under a Creative Commons CC0 license so any of us can use it for our own automation projects.
- Aims to treat users like intelligent adults by balancing honesty with compassion in difficult tradeoffs.
How do you guys feel about AI models having their own "values" like this, and do you think it makes them more or less useful for your workflows?
Claude's Constitution:
Read more: