Looking for advise on AI coding agents memory benchmark
Hi everyone, I played with Claude Code and created a memory software for coding AI agent (I don't know anything about code or software development, but something came out of the oven). Anybody knows if there a good benchmark test to validate it? Claude advised longmemeval and locomo as benchmark tests, we ran longmemeval and got somewhere around 80-90% on it (depending on how to look at the resaults, is three=3 and counts as a valid result!? Idk) but after running for 8 hours and spending around 100$ on API calls he realized that it's a wrong benchmark for the project and can't advise on an existing one. HELP🤷‍♂️
3
13 comments
Temnii Gray
3
Looking for advise on AI coding agents memory benchmark
Clief Notes
skool.com/cliefnotes
Jake Van Clief, giving you the Cliff notes on the new AI age.
Leaderboard (30-day)
Powered by