GSD 1.0 vs. PAUL Experiment Writeup

(Sorry for the novel there is a lot to unpack here).

Over the past 3–4 weeks, I’ve shifted from GSD 1.0 to PAUL. I was running into issues with GSD around the 18–20 phase / large milestone mark. It consistently hit its memory wall, then began hallucinating and poisoning its own context.

Typically, it would derail in three ways:

It would completely hallucinate, loop endlessly, and ignore my prompts—just churning tokens.
It would claim it completed something that it hadn’t.
It would introduce design patterns I never asked for.

After watching

@Charles Dove

presentation—where he framed GSD as more of a sprint tool and PAUL as more of a marathon tool—I decided to fully replace GSD with PAUL. This wasn’t trivial. It took about 6–8 hours to extract, aggregate, and consolidate GSD history, then feed that into PAUL to bring it up to speed. Once that was done, I resumed feature development. Around the same 20–22 phase mark, I started seeing PAUL behave similarly.

Key Realization

What I learned at that point was this: Claude (and specifically CLAUDE.md) has a qualitative boundary for how much context it can handle effectively before performance degrades. You’ll hear people reference a “~200 line guideline,” but that’s based on anecdotal experience—not anything officially documented by Anthropic. Still, there is clearly a real constraint here. Once I understood this, I reviewed both my CLAUDE.md and PAUL’s STATE.md.

CLAUDE.md should stay lean and focused—this is your map.
PAUL’s STATE.md tracks session state and updates continuously—which is powerful, but also subject to the same limits.

When PAUL suddenly decided I needed Redis (completely unprompted) and one of my Terraform applies actually deployed Redis, I knew something was seriously wrong. I checked STATE.md: ~500 lines. I had Claude summarize and optimize it, reducing it to ~150 lines. After that, things started stabilizing again.

Root Cause

Around the same time, I learned about the “novel vs. map” problem. Was this partly due to my overproduction of PRDs and research that weren’t well-structured or pruned? Yes—100%. But there’s also a second factor: how Claude handles session resets and context recovery.

Bad Habit I Built

With GSD 1.0, I developed a habit of using handoff files. I even built a custom “checkpoint” skill (essentially /paul:handoff). When I switched to PAUL, handoff was built in—no setup required.

Because I learned Claude through these abstractions, my native understanding of Claude was basically zero.

To manage context bloat, I was running:

/paul:handoff
/clear
/paul:resume

…probably 100 times per day (not exaggerating). I became a button pusher.

The Real Problem

This created a bad habit: I never learned how native Claude memory actually works. @roman-colman-7247 refers to this as managing your “trunk.” The idea is to actively prune context over time and use tools like /rewind to maintain a clean, efficient working state. I’m not good at this yet. I’m still learning. I still rely on handoffs because I don’t trust myself not to lose context.

Where I Am Now

PAUL is extremely powerful. I highly recommend it for long-running, complex projects—especially when combined with CARL, which dynamically loads skills based on domain keywords. And make sure you create highly efficient spec files. If you are not using DSPy to tune your prompts you need to start TODAY! Right now, though, I’m forcing myself to go without these tools. It’s painful. I struggle constantly with context loss. But I want to understand the engine before adding turbochargers.

Final Thoughts

Huge shoutout to

@Chris Kahler

for the work he’s put into these tools. Thank you!!!

Looking forward to your thoughts on this, Chris—and please correct anything I got wrong. Your business partner Chucky told me to write this up 🙂

Cheers,

:::

7 comments