
Scaling AI Memory: How I Tamed a 120k-Token Prompt with Deterministic GraphRAG
In a past article, I wrote about Synapse , an AI companion I built for my wife . To solve the problem of an LLM forgetting her past, I bypassed standard vector RAG entirely. Instead, I used a Knowledge Graph (via Graphiti and Neo4j) to map her life, compiled the entire graph into text, and injected it straight into Gemini's massive context window. It worked beautifully. Until it didn't. When you build a prototype, you test it with a few messages. When your wife is the power user, she builds an entire world. By day 21 of her using the app daily for deep sessions, the system hit a wall. Here is the raw data of her input tokens per message over 18 days: She was sending over 120,000 tokens of system context on every single chat turn . Gemini handled it. Modern context windows are incredible, but the reality of production kicked in. My API costs were climbing, Convex bandwidth was getting chewed up storing and moving massive payloads, and latency was increasing. Dumping everything into the
Continue reading on Dev.to
Opens in a new tab




