
I Put a 5MB Rust Binary Between My Code and Every LLM API — It Cut My Bill by 40%
Every developer using LLMs faces the same three problems: Cost blindness — you cannot answer "how much did I spend today?" No failover — when OpenAI goes down, your app goes down Wasted money — identical prompts hit the API over and over instead of being cached I built llmux to fix all three with zero code changes. What is llmux? A single Rust binary (~5MB) that sits between your code and every LLM API. It handles failover, caching, rate limiting, and cost tracking automatically. Your code (any language) | http://localhost:4000 | ┌──────┐ │ llmux │ ← single binary, ~5MB └──┬───┘ │ ┌────┼────┬────────┐ ▼ ▼ ▼ ▼ OpenAI Claude Gemini Ollama Zero Code Changes You change one environment variable. That is it. # Before export OPENAI_BASE_URL = https://api.openai.com # After export OPENAI_BASE_URL = http://localhost:4000/v1 Your existing code — Python, TypeScript, Go, whatever — keeps working exactly the same. llmux intercepts the calls and adds superpowers. Quick Start git clone https://github
Continue reading on Dev.to
Opens in a new tab
