What a Token Audit Actually Finds in Production Agent Systems

I've been running token audits on AI agent systems and the findings are almost always the same. Not because every team is doing the same thing wrong — but because the inefficiencies are invisible until you look for them. Here's what actually shows up. 1. System prompt redundancy (the big one) The most common finding: teams copy-paste the full system prompt into every message "just to be safe." The intent makes sense — context window continuity, predictable behavior. The cost doesn't. If your system prompt is 800 tokens and you're running 100,000 turns a day, that's 80 million tokens burned on the same 800 words. Every day. On every conversation. Fixes that work: Cache-friendly system prompt placement (Anthropic/Gemini cache the first N tokens if they don't change) Separate static context from dynamic context Only re-inject on session reset, not every message 2. Tool schemas written for humans, not agents JSON schemas with full field descriptions, usage examples, type explanations — the

What a Token Audit Actually Finds in Production Agent Systems

Related Articles

I built an expense tracker because every other one wanted my bank login

Samsung Galaxy S26 and Galaxy S26+ Review: Lacking Ambition

5 kitchen splurges that I can't recommend enough

Here’s how to rank the 50 best Apple products ever

Fix Payment and Tax Issues in Museum Ticketing Software

Related Articles

How-To
I built an expense tracker because every other one wanted my bank login
Dev.to • 1h ago

How-To
Samsung Galaxy S26 and Galaxy S26+ Review: Lacking Ambition
Wired • 5h ago

How-To
5 kitchen splurges that I can't recommend enough
ZDNet • 6h ago

How-To
Here’s how to rank the 50 best Apple products ever
The Verge • 6h ago

How-To
Fix Payment and Tax Issues in Museum Ticketing Software
Dev.to Beginners • 7h ago