
How I built budget enforcement that actually works for AI APIs
I've been using Claude Code daily for 1 year across 30+ projects. When I checked what all those sessions would cost at API rates, the number was over $10,000. Claude Max subscribers have zero visibility into this. No dashboard, no breakdown, no way to know which project or session is burning the most tokens. So I built two things. An MCP server that shows Claude Code users their costs in real time, no API key needed, reads local session data directly. And an open-source API gateway called LLMKit with actual budget enforcement for teams routing traffic through AI providers. The budget layer took longer than everything else combined. Database locks, Redis counters, optimistic concurrency: nothing held up under concurrent agent traffic. The gap between "check balance" and "record cost" is where money disappears. Cloudflare Durable Objects turned out to be the answer. Why every other approach leaks money Standard flow in most AI proxies: Request comes in -> Read balance from DB (sees $12 u
Continue reading on Dev.to
Opens in a new tab



