
Your AI agent re-sends 80% of your budget every loop
Your ReAct agent runs 15 turns. By turn 10, input_tokens is 87K. You're re-sending the entire conversation history every single iteration. That's not generation cost. That's re-reading cost. And no observability tool shows you the trajectory. We built a metric for it. Then we built a guard that stops the bleed before it kills your budget. Here's the problem, the math, and the fix. The invisible cost of agent loops Here's how a typical ReAct agent works: Turn 1: system prompt + user query → 1,200 input tokens Turn 2: + assistant response + tool result → 3,800 input tokens Turn 5: + three more rounds of think/act/observe → 15,000 input tokens Turn 10: the entire conversation so far → 87,000 input tokens Turn 15: approaching the context limit → 152,000 input tokens Every turn re-sends everything. The system prompt. The user's question. Every assistant response. Every tool result. The LLM has no memory between calls — you're paying to "remind" it what happened. On GPT-4o ($2.50/M input tok
Continue reading on Dev.to
Opens in a new tab


