Back to articles
How I Built a Memory System That Scores 96.2% on LongMemEval (#1 in the World)

How I Built a Memory System That Scores 96.2% on LongMemEval (#1 in the World)

via Dev.to PythonJordan McCann

Agentmemory V4 is 481 correct out of 500. 96.20% on the LongMemEval benchmark under real-retrieval conditions, the highest published score on this benchmark by any single-pass system. I built it alone. No team. No funding. No degree. A mid-range gaming PC with an Intel Core i3-12100F, 16 days of development, roughly $1,000 in API costs, and around 300 million tokens consumed across development. The previous world record was 95.60%, held by PwC Chronos, a research team that published an arXiv paper. Before them, the leaderboard included Mastra (94.87%), OMEGA (93.2%), Hindsight from Vectorize/Virginia Tech (91.4%), Emergence AI (86%), Supermemory (85.86%), and Zep (71.2%). All funded companies or research labs with teams. I want to write up exactly how this happened, because the path was not clean. It was systematic and slow, punctuated by a moment where I nearly accepted a completely invalid result. What LongMemEval Is (and Why It Matters) LongMemEval (Wu et al., 2024; ICLR 2025) is a

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
2 views

Related Articles