Help: My AI Visual Agent is ignoring strict rules in bulk image generation.

Hey guys, running into a wall with my automated YouTube automation pipeline. I’m generating 15-20 minute historical storytelling videos using AI agents to write the script and image prompts, which then get passed to Remotion for rendering.

The Issue: My prompt-writing agent is suffering from cognitive overload. I am feeding it a massive script and asking it to follow very strict rules for visual consistency (specific historical eras, 70+ word count minimums, and strict character locking).

Instead of following the rules, it takes the path of least resistance:

It outputs incredibly short prompts (30 words).
It hallucinates generic AI stock styles instead of the specific historical aesthetic I need.
It completely loses the visual context of the story halfway through the video.

My Current Fix: I'm breaking the agent down into "Micro-Agents." Instead of one prompt doing everything, I am using code to chop the script into small chunks, forcing strict JSON schemas, and injecting the rules at the code level (JavaScript) rather than relying on the LLM to remember them.

Has anyone else built a high-volume image generation pipeline? How do you force your LLMs to 100% comply with strict stylistic rules without context decay?

1 comment

AI Operators Club

skool.com/ai-operators-club-2465

Join a community of AI operators building real agencies, and learn how to build, sell, and scale yours.

AI Video Creators Community

AI Money Lab

Ashish Builds Academy – Lite

Bring people together around your passion and get paid.