
What a Token Audit Actually Finds in Production Agent Systems
I've been running token audits on AI agent systems and the findings are almost always the same. Not because every team is doing the same thing wrong — but because the inefficiencies are invisible until you look for them. Here's what actually shows up. 1. System prompt redundancy (the big one) The most common finding: teams copy-paste the full system prompt into every message "just to be safe." The intent makes sense — context window continuity, predictable behavior. The cost doesn't. If your system prompt is 800 tokens and you're running 100,000 turns a day, that's 80 million tokens burned on the same 800 words. Every day. On every conversation. Fixes that work: Cache-friendly system prompt placement (Anthropic/Gemini cache the first N tokens if they don't change) Separate static context from dynamic context Only re-inject on session reset, not every message 2. Tool schemas written for humans, not agents JSON schemas with full field descriptions, usage examples, type explanations — the
Continue reading on Dev.to DevOps
Opens in a new tab

.jpg&w=1200&q=75)


