
The $0 Problem: Why Every Tool Says Your On-Prem Inference is Free
If you run LLMs on your own hardware, every cost tracking tool in the ecosystem has the same answer for what it costs: $0 . OpenCost sees your GPU pods but has no concept of tokens. LiteLLM tracks tokens per user but hardcodes on-prem cost to zero. Langfuse traces requests but only prices cloud APIs. The FinOps Foundation's own working group explicitly says on-premises AI cost is "outside the scope." Meanwhile, your GPUs cost real money. The H100s draw 700 watts each. Your electricity bill is real. The three-year amortization on $280K of hardware is real. But no tool computes: true cost per token = (hardware amortization + electricity x GPU power draw) / tokens per hour We built InferCost to fix this. What InferCost does InferCost is an open-source Kubernetes operator (Apache 2.0) that computes the true cost of running AI inference on your own hardware. It's a single controller pod. No database, no UI to host. It plugs into Prometheus and Grafana you already run. You declare your hardw
Continue reading on Dev.to
Opens in a new tab


