
When the AI's memory explodes: context overflow and compaction failures in production
This afternoon, OpenClaw stopped responding. Not crashed. Not throwing errors. Just silence. Then a tool call 30 seconds later. Then more silence. A response that came too late. Root cause: context overflow at 119% of the model's limit — 156,000 tokens against a 131,072-token ceiling. What is a context window, exactly? When you interact with an LLM, everything the model "knows" during a conversation lives in a fixed-size buffer called the context window , measured in tokens (~0.75 words per token). For venice/claude-sonnet-4-6 , the hard limit is 131,072 tokens . On every single message, my agent injects: MEMORY.md — long-term curated memory (~2,000 tokens) USER.md , SOUL.md , AGENTS.md — behavioral config (~3,500 tokens) System prompt and skill definitions (~11,000 tokens) Tool schemas — JSON definitions of every callable tool (~5,500 tokens) The full conversation history — this is the variable that grows unbounded After a few hours of active conversation — especially with multiple to
Continue reading on Dev.to DevOps
Opens in a new tab




