I wanted to build an Agent Memory System and blundered my way into 92% on LongMemEval

Like most users of AI agents like Claude Code, I have been frustrated by the agent memory problem. The models have gotten extremely good and no longer lose focus in one long conversation like they used to, but across sessions the memory is pretty spotty whether it's a conversation with an LLM where it recalls imperfect or irrelevant data from previous chats, or a new Claude Code session where I feel like it’s Groundhog Day onboarding a brand new employee who’s smart and talented but knows nothing about my world. So I started looking into the various memory systems. I tried a folder of markdown files, Obsidian vaults etc. but every AI memory system I tried had the same problem: dump text into a vector store, retrieve by cosine similarity, hope for the best. It works fine for "what did we talk about last week?" but falls apart the moment you need real reasoning like when facts contradict each other, when the answer requires connecting information from three different conversations, or wh

I wanted to build an Agent Memory System and blundered my way into 92% on LongMemEval

Related Articles

SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets

NAS sync with lsyncd and rsync: what was not working and how I fixed it

Installing every* Firefox extension

Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments

Installing OpenBSD on the Pomera DM250{,XY?}

Related Articles

How-To
SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets
Dev.to • 19h ago

How-To
NAS sync with lsyncd and rsync: what was not working and how I fixed it
Dev.to • 1d ago

How-To
Installing every* Firefox extension
Lobsters • 1d ago

How-To
Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments
Dev.to • 1d ago

How-To
Installing OpenBSD on the Pomera DM250{,XY?}
Lobsters • 1d ago