And I used the downtime to test some new patterns for memory storage (as an improvement to PMM, but as a separate product that has always been planned in the backlog for enterprise deployment). Mostly because we may be testing PMM in the field with a couple of enterprise orgs soon.
Looking back at some of the conversations I have been having with the community here, I realised that I haven't done a great job of explaining what a memory system like mine does. Mainly because it was never intended to be an end-user tool.
Poor Man's Memory was created as a plugin that we exposed as a field test / proof of concept so we can quickly get validation and feedback from early supporters like the wonderful people in this community.
The reality is that it's a tool that (I have my doubts, but we're running research to test our claims) seeks to rival RAG, vector and graph db that most LLMs use in their agent harnesses. It's how chatbots know what you're talking about, and how they remember the current conversation (and how they also forget distant conversations).
LLM's have been designed with apps that take your entire conversation history in each session and add that + your most recent prompt to give it the illusion of conversation. This is why LLMs are accused of being nothing more than an expensive auto-complete tool (which they are to some extent). This is why you get frequent warnings of compaction or the need to clear context especially when working on smaller 200K models on Claude. As your conversation grows longer, the more stuff gets added to the context.
RAG is expensive. PMM is... well... poor. It was intended to be a tool for people building chat bots and assistants on claude a way to give their agents long term memory so these agents hallucinate less (and incidentally, result in better conversations).
RAG, vectors, and graphs add a dimension to this static conversational history. You won't see them in the end-user tools like Claude Code or Cowork. Obsidian, coremem and even Mila Jovovic's recent memory palace tool to some extent use some of these tools to provide a searchable memory that you can query in your chat. But at it's most basic, these tools perform a search from your prompt, run a similarity search or look for connected concepts and conversations, and then stuff these findings into your context window (for the better ones), or simply return a search result in your chat window (obsidian, memory palace). And here's where PMM is different. It was designed to be autonomous. It doesn't require users to provide notes or dump material. It's memory collection and retrieval happens in the normal course of conversations with the LLM.
It remembers conversations from multiple sessions ago (outside of your current conversation context), from many sessions prior to /compacts and /clears. It's memory that's survives (if you save religiously) and more importantly (we observed) hallucinates less. Gaslighting happens, but is rare compared the amount of gaslighting I get with the default Claude memory or RAG systems.
It writes and reads memory as it needs to (and that was our original thesis). It does not need to perform search or return search results because it knows it has memory and will retrieve what it needs to.
I have been hesitant to compare PMM to existing retrieval architectures, and that led to a research that accidentally benchmarked us to the "perfect oracle" = a standard that even RAG and other retrieval systems try hard to achieve.
There is no vector or graph db. It's performance (we measure) in retrieval-to-oracle of dense literature is within 0.08 (a score of 0.78) of a perfect oracle score of 0.85 (theoretically, this is the upper limit). RAG, vectors and graph stores typically between 0.55 - 0.65 (a gap of 0.25 - 0.35).
So PMM is being re-written (as a different development, dubbed coremem). I have a suspicion that we don't need as much structure that PMM currently enforces. I want to build even greater autonomy in the way LLMs handle our memory layer. Memory should grow (and fade) depending on the individual user's working style and requirements.
Token jail has been a wonderful time for me (despite my theatrics to the contrary).
During this time:
- Implemented side-loading strategy for third-party models in claude code (which fits in nicely with the agent harnesses we wrote out for the orchestration layer) = because it's all files and settings.json per agent allows us to control which models each agent role runs.
- Really sat with the data and findings from v1.5 research data. It was surprising. We accidentally benchmarked ourselves against the golden goose: dense trieval (dr) retrieval-to-oracle gap instead of RAG systems themselves (which is what Boris Cherny and Anthropic did 10 months ago). Most RAG systems probably have a gap of 0.25 - 0.35 (scoring between 0.50 - 0.65) from the ideal oracle upper bound of 0.85. We came close at 0.78. That's validation for Jake, a folder and file system coming really close to the ceiling when other complicated vector and graph systems trail behind.
- Started refactoring PMM, but will redevelop this as a different memory system (which I will test on new agents). Calling it memcore (because I am a troll, and coremem already exists).
Most of the stuff I have on my backlog is thanks to the feedback I get from the conversations in this community.
Thank you.