Sam Nepomuceno

New benchmark out of Meta FAIR, Stanford, and Harvard called ProgramBench. The setup: you get a compiled executable plus its docs. Source code stripped. Rebuild the program from scratch in any language you want. Tests check input/output behavior against the original binary. 200 tasks, from small CLI tools up to FFmpeg, SQLite, and the PHP interpreter. 📊 Results across 9 models: Zero tasks fully solved. Opus 4.7 was the best, passing 95% of tests on only 3% of tasks. GPT 5.4, Gemini 3.1 Pro, and Haiku 4.5 hit 0% in that bucket. The interesting part is section 5. Even the model solutions that "worked" looked nothing like the human reference. Median 1,173 lines vs 3,068 in the original. Flat directories. Fewer functions, each one longer. GPT 5.4 wrote 96% of its final code in a single turn on most tasks and never modified existing files on roughly 40% of runs. 🎯 Why it matters for us: The benchmark separates writing code from designing software. Models can produce syntax all day. They cannot yet decompose a real system into coherent modules, pick the right abstractions, or organize a codebase the way a working engineer would. That gap is what computational orchestration points at. It is also where the durable value lives. 🛠 Try it: Pick an easier task from the repo (the paper flags nnn, fzf, gron, and jq as more tractable). Run it against Claude or your model of choice. Watch where you and the model split. Note the design decisions you make that the model never even raises. Post your runs and attempts to create a harness that would allow the model to do it. Wins, failures, weird outputs, all of it. 📍 Paper and Repo: ProgramBench I'm building something on top of this right now. More soon.

New comment 15h ago

Sam Nepomuceno

1 like • 1d

@Jake Van Clief so is opus 4.7 Good when it has the right folder structure? I've been seeing a lot of people saying its performance is worse than 4.6 over the last month or so.

Donald Roy

2d •

🛠️ Show Your Work

Meet Spec Sheet Stu 😎

Feeling like I'm starting to get the hang of this ICM process. This framework is "Spec Sheet Stu" who I created to help kickoff projects within our business. Stu is not automating anyone's current work he is making possible something that simply wasn't before. I've included the file breakdown below if anyone wanted to pick it apart and give me pointers. I work in design, manufacturing, and installation of turnkey automation equipment for manufacturers. It has always been a pain point in our business transferring knowledge from the principal engineers in sales to the execution team designing and manufacturing the systems against booked work. The senior engineers have a vision when they quote and price work and it takes many hours of meetings and constant feedback loops to arrive at a completed design with the engineering team after all the knowledge has been transferred. Stu is the middle man. He first takes in all documentation from the sales team. He then works through a 5 phase process with the senior engineers gathering all the information he needs to generate four documents. These documents create a full picture of what is determined up front and what is left to figure out. He sorts out all ambiguity between contracts and scope documents and leaves you with a clear “this is what we know” and “this is what we need to figure out still”. This saves days of work for the sales team in generating kickoff documentation and the result on the other side far exceeds what was possible before. Once the documentation is set, Stu then becomes an advisor on project scope and can answer any questions as a first line of defense before having to go back to the sales team. There are myriad benefits to this. Less back and forth between the senior and project engineers, reduced project hours, less roadblocks, more up front flagging of contract conflicts, the list goes on. These are the kinds of things I think are the most amazing about this technology. Automation of repetitive work is nice but we can do things that wouldn't have been possible without teams of people that would've made the work unsustainable.

New comment 2d ago

Sam Nepomuceno

0 likes • 2d

this is awesome. I've been working on something similar!

Matthew Creamer

6d •

💰 Competitions

🏆 WEEKLY COMP #3: THE SPECIALIST 🏆

💰 $325 CASH PRIZE 💰 That's a full year of Premium. Win this and your membership pays for itself. 📋 THE CHALLENGE You just got hired again. Different client this time. Meet Sarah, a freelance copywriter who's drowning in context-switching. 📎 Download the full client brief attached to this post. Short version: She works with three types of clients (SaaS founders, ecommerce brands, local service businesses) and starts from scratch every project. She doesn't need another tool. She needs a system. Your job is to build her a folder-based AI specialist she can drop into any Claude project. The folder IS the deliverable. 🗂️ THIS WEEK YOU LEARN ICM Up until now, comps have been "build a thing." This week you utilize the methodology taught throughout the community. 🧠 Folders as architecture. That's it. That's the whole concept this week. Your specialist is a folder with five things: - 📄 identity.md (who they are) - 📐 rules.md (how they respond) - 💬 examples.md (what good looks like) - 📚 reference/ (source material) - 📖 README.md (how to use it) Drop the folder into a Claude project. Claude becomes the specialist. Reusable. Shareable. Portable. 🎯 PICK YOUR SPECIALIST Don't pick copywriting. That's Sarah's example. Pick something YOU would actually use. A few sparks to get you thinking: - A salary negotiation coach - A meal planner that knows your dietary restrictions - A code reviewer for your stack - A real estate market analyst for your city - A technical recruiter screener - A grant writer for nonprofits in your space The more specific, the better. "Marketing expert" is not a specialist. "B2B email expert for enterprise SaaS targeting CFOs" is. 💼 WHY THIS ONE LANDS ON YOUR RESUME Real talk. Winning a comp in a Skool community doesn't get you a job by itself. But shipping a working folder-based AI specialist with a clean README and a public repo? That's a portfolio piece.

134

New comment 2h ago

Sam Nepomuceno

4 likes • 3d

Hi @Matthew Creamer and @Jake Van Clief Here is my submission: Shopify SEO Auditor — a folder-based specialist for auditing Shopify stores. Drop it into a Claude project and describe any product page, collection structure, or schema question — it returns a prioritized findings table, Quick Wins you can action today, and Strategic Changes with copy-pasteable JSON-LD code. Built on 16 Shopify-specific failure patterns, complete Liquid-templated schema blocks, and a full technical SEO checklist that vanilla Claude doesn't have. https://drive.google.com/drive/folders/1GtBSgAtQfeQegCFb1PRFbM8_szMSM69B?usp=sharing This is based on my personal experience doing SEO for shopify ecommerce stores

Jake Van Clief

👑

⭐

4d •

🚨 Help & Troubleshooting

I come asking for help!

Because of the Amazing support you all gave for the first Round Wylder (my step daughter) made it into the second round! You can vote once a day and some days are 2x votes ! I would love love love if any of you support her going to work with some of the best animal rescues in the world to just cast at least one free vote if you can! You can vote here! Not Ai related so sorry for that ! Wylder | Junior Ranger

183

157

New comment 3h ago

Sam Nepomuceno

0 likes • 3d

Will do!

Curtis Hays

6d •

💬 General discussion

Agents are just folders.

There have been some good questions in my previous posts about my agents, so I want to clear up a few things. I call them agents because it helps me think and stay organized. But strip the jargon and here's what's actually happening: a folder with the right files in it tells an AI who it is, what it does, and what good looks like. Unix figured this out a long time ago. I remember working on mainframes in my early 20s. Files in folders. It wasn't powerful because it was complex. It was powerful because it wasn't. Unix came out in 1969; I was using it from 1998 to 2002. The "writing room" is a folder. Cash lives in it. His instructions, his guardrails, his examples, his voice reference — all files. The AI reads the folder and knows how to behave. The room gives it context. The files give it structure. I have 15 of these rooms. Duke orchestrates between them. The naming isn't the point. The structure is. Here's the part most people miss: almost every agent in my system has a human counterpart. A real expert whose domain knowledge shaped the instructions — and who can tell me when the agent gets it wrong. Cash's counterpart knows copywriting. Trace's counterpart knows data. That feedback loop is how the system actually improves. You're not building AI. You're building infrastructure. Build the foundation. Build the structure. The agents are just what you call it when the rooms start working together. My lesson: don't copy me, don't copy Jake, we all learn from each other, and then you make it your own. Stick to the fundamentals. Watch Jake's videos; it will rewire your brain and change how you think about AI. There are no shortcuts. You have to build a foundation first. Links to referenced posts: https://www.skool.com/quantum-quill-lyceum-1116/visualized-my-agent-team https://www.skool.com/quantum-quill-lyceum-1116/the-folder-system-became-my-agency

New comment 4d ago