Sher Hassan

Hey guys, running into a wall with my YouTube automation pipeline. I’m using Claude Code to build and orchestrate AI agents that write scripts and generate image prompts, which then get passed to Remotion for rendering 15-20 minute historical storytelling videos. The Issue: My prompt-writing agent is suffering from cognitive overload. I am feeding it a massive script and asking it to follow very strict rules for visual consistency (specific historical eras, 70+ word count minimums, and strict character locking). Instead of following the rules, the LLM takes the path of least resistance: 1. It outputs incredibly short prompts (30 words instead of the required 70+). 2. It hallucinates generic AI stock styles instead of the specific historical aesthetic I need. 3. It completely loses the visual context of the story halfway through the generation process. My Current Fix: I'm trying to break the process down. Instead of one monolithic prompt doing everything, I am using Claude Code to chop the script into small chunks, force strict JSON schemas (Structured Outputs), and inject the visual styles and negative prompts at the code level (JavaScript) rather than relying on the LLM to remember them. Has anyone else built a high-volume image generation pipeline using Claude Code or similar CLI tools? How do you force your LLMs to 100% comply with strict stylistic rules without

Sher Hassan

11d •

❓ | questions

Help: My AI Visual Agent is ignoring strict rules in bulk image generation.

Hey guys, running into a wall with my automated YouTube automation pipeline. I’m generating 15-20 minute historical storytelling videos using AI agents to write the script and image prompts, which then get passed to Remotion for rendering. The Issue: My prompt-writing agent is suffering from cognitive overload. I am feeding it a massive script and asking it to follow very strict rules for visual consistency (specific historical eras, 70+ word count minimums, and strict character locking). Instead of following the rules, it takes the path of least resistance: 1. It outputs incredibly short prompts (30 words). 2. It hallucinates generic AI stock styles instead of the specific historical aesthetic I need. 3. It completely loses the visual context of the story halfway through the video. My Current Fix: I'm breaking the agent down into "Micro-Agents." Instead of one prompt doing everything, I am using code to chop the script into small chunks, forcing strict JSON schemas, and injecting the rules at the code level (JavaScript) rather than relying on the LLM to remember them. Has anyone else built a high-volume image generation pipeline? How do you force your LLMs to 100% comply with strict stylistic rules without context decay?

New comment 9d ago

1-2 of 2

Level 1

4points to level up

Sher Hassan

@sher-hassan-1492

Ai enthusiastic

Active 1d ago

Joined Mar 31, 2026

Contributions

Followers

Following