
Per-customer cost attribution without a proxy
Most AI cost tracking solutions force you to route all your LLM traffic through their proxy. Tbh, that's an architectural nightmare waiting to happen. You're adding latency, introducing a single point of failure, and giving some third-party service the keys to your entire prompt stream. If their proxy goes down, your app goes down. If their proxy gets slow, your users think your app is slow. And let's not even talk about the compliance headache of sending sensitive customer data through an intermediary just to track API costs. You don't need a proxy to figure out which customer is burning your OpenAI budget. You just need proper attribution at the request level, handled asynchronously. The Problem with LLM Billing When you look at your billing dashboard on OpenAI or Anthropic, you just see total tokens used and a massive dollar amount at the end of the month. You don't see that user_123 ran a massive batch extraction job that cost you $40 in API calls, while your other 100 users cost $
Continue reading on Dev.to
Opens in a new tab