Hey guys, running into a wall with my YouTube automation pipeline. I’m using Claude Code to build and orchestrate AI agents that write scripts and generate image prompts, which then get passed to Remotion for rendering 15-20 minute historical storytelling videos.
The Issue: My prompt-writing agent is suffering from cognitive overload. I am feeding it a massive script and asking it to follow very strict rules for visual consistency (specific historical eras, 70+ word count minimums, and strict character locking).
Instead of following the rules, the LLM takes the path of least resistance:
- It outputs incredibly short prompts (30 words instead of the required 70+).
- It hallucinates generic AI stock styles instead of the specific historical aesthetic I need.
- It completely loses the visual context of the story halfway through the generation process.
My Current Fix: I'm trying to break the process down. Instead of one monolithic prompt doing everything, I am using Claude Code to chop the script into small chunks, force strict JSON schemas (Structured Outputs), and inject the visual styles and negative prompts at the code level (JavaScript) rather than relying on the LLM to remember them.
Has anyone else built a high-volume image generation pipeline using Claude Code or similar CLI tools? How do you force your LLMs to 100% comply with strict stylistic rules without