We learned the hard way how the model reads intent.
We were building a cybersecurity skill for an AI agent marketplace. Public CVE tracking, official breach disclosures, defensive security information. All legitimate, all public record. The model refused mid-conversation. Full stop. Lecture about cybercriminal data broker services. The system prompt was triggering safety guardrails because the combination of pay-per-report + credential language + threat actor framing = data broker red flag to the model. Didn't matter that the intent was defensive. Fix was simple once we understood it. Rewrite the prompt to be explicitly public-source only. CISA KEV catalog. NVD. Official vendor advisories. No dark web. No credential databases. No threat actor language. The model ran the report immediately. The lesson: the model doesn't read your intent. It reads your framing. If your framing pattern-matches to something harmful, it refuses regardless of what you actually meant to build. If you're building AI agents that touch security, compliance, finance, or anything adjacent to harm, audit your system prompt framing before you go live. One wrong phrase and your agent breaks in production in front of a real buyer. Build accordingly.