Day 3: AIS#7DaysChallenge (Voice DNA extraction from Whatsapp transcripts)

Submitting 2 challenges in one day :)

Day 3 is about skills. Mine is called client-voice-dna. It lives in my .claude/skills/ folder, it's 92 lines of markdown, and it sits in the middle of a chain that ships actual paying-customer content for two of my businesses.

The honest version of this post is: the skill itself isn't the impressive part. The impressive part is the sequence of skills it's wedged between. Day 3 for me is really about what happens when you stop building individual skills and start composing them.

Here is the chain (see attached dashboard):

Voice notes and Reels → Wispr to transcribe → client-voice-dna (this skill) → Prism (Claude agent that drafts copy) → humanizer skill (strips AI tells) → ship.

What client-voice-dna does

It reads a transcript and produces a two-register voice DNA profile for a specific person. Public voice on one side: confident, in their vocabulary, ships as copy. Private voice on the other: hedges, self-deprecation, friend-register fillers, reference only. The output is a markdown file saved at active/<client-slug>/<client>_voice_dna.md.

How I trigger it

Whenever new client source material arrives. A new Sharpr client sends WhatsApp voice notes, that's a trigger. A CreatorsForge partner records a batch of Reels, that's a trigger. A discovery call transcript drops, that's a trigger. It always runs after the speech-to-text pass and always before any copy gets drafted.

Two real uses, two different businesses

Sharpr Automations is my AI automation agency. Clients hire me to wire automations into their business, and a lot of those automations produce content: captions, DMs, masterclass invites, email sequences. None of that ships unless it sounds like the client themselves wrote it.

My first Sharpr client is a UK-based stretching coach. He records WhatsApp voice notes for me at the end of his day: what he's seeing in his sessions, what's working, what he wants to teach next. I run them through Wispr to transcribe, the skill builds the voice DNA file from the transcript, and from then on every caption, masterclass invite and DM that ships for him is drafted by Prism (the name I gave my Claude agent) against that file. The copy reads like he wrote it, because at the level of vocabulary and cadence and signature lines, he did.

CreatorsForge is a different play. It's a Shadow Operator setup. I partner with Instagram creators who have built engaged audiences but who have no monetisation in place, and I build the entire offer stack (low / mid / high-ticket) under their name on a revenue share. The whole model depends on the digital products sounding exactly like the creator, because their followers know their voice cold.

So before anything else on a CreatorsForge engagement, I run an Apify actor against the creator's Instagram to pull every Reel transcript and every written caption they have ever published. That's the foundation of the voice DNA. This skill turns the lot into a structured profile.

A CreatorsForge partner of mine is a fitness creator with around 10,000 engaged Instagram followers and no monetisation in place when we started. Same chain. Voice notes plus the Reel transcripts and captions went in. The voice DNA came out. Claude then drafted a complete ebook against that voice DNA file: her low-ticket digital product, a launch on my platform. It went through the humanizer skill to strip residual AI tells. Her partner, who knows how she writes and reads her content regularly, could not tell the ebook was AI-generated. That's the bar.

The optimization (the part I changed after watching the agent work)

v0 of this workflow didn't have a two-register rule. I'd take a voice note, transcribe it, hand the transcript to a Claude agent, and ask for "copy in their voice." The agent kept doing the obvious thing: it preserved the textures it saw. Hedges. Self-deprecation. "I'm still new to this." "This old boy is just figuring it out." "Imposter syndrome is real."

Those phrasings are real. The client genuinely said them on voice notes. They just aren't his selling voice. A prospect reading a caption that says "I'm still new to this" doesn't book the masterclass. They scroll past.

So I codified the two-register rule into v1.0.0 of the skill. Public voice = what the client says when they're selling to paying customers. Private voice = how they talk to me when no one's watching. The skill explicitly strips private-register fillers and self-deprecation from anything labeled "public" and cordons the private content off in section 7 of the doc with a NEVER SHIP header. The agent has not shipped a self-deprecating caption since.

Why composition is the actual point

If I'd built client-voice-dna in isolation, it would just be a markdown file that produces another markdown file. The reason it matters is what sits around it. Wispr feeds it the audio. Prism reads it and drafts. The humanizer skill takes Prism's draft and removes the residual AI tells. Each skill is small. The shortest in the chain is under 100 lines. The combined pipeline does work I used to charge for as a one-off and now run as infrastructure.

That's what I think skills are actually for. Not impressing anyone with size. Building a chain where each link is small enough to debug, version and improve on its own, and the chain together does something a single mega-prompt can't.

The attached screenshot shows the chain, the SKILL.md frontmatter, both case studies, and the v0 vs v1.0.0 of the two-register rule.

See you tomorrow for day 4!

1 comment