I Put a 5MB Rust Binary Between My Code and Every LLM API — It Cut My Bill by 40%

Every developer using LLMs faces the same three problems: Cost blindness — you cannot answer "how much did I spend today?" No failover — when OpenAI goes down, your app goes down Wasted money — identical prompts hit the API over and over instead of being cached I built llmux to fix all three with zero code changes. What is llmux? A single Rust binary (~5MB) that sits between your code and every LLM API. It handles failover, caching, rate limiting, and cost tracking automatically. Your code (any language) | http://localhost:4000 | ┌──────┐ │ llmux │ ← single binary, ~5MB └──┬───┘ │ ┌────┼────┬────────┐ ▼ ▼ ▼ ▼ OpenAI Claude Gemini Ollama Zero Code Changes You change one environment variable. That is it. # Before export OPENAI_BASE_URL = https://api.openai.com # After export OPENAI_BASE_URL = http://localhost:4000/v1 Your existing code — Python, TypeScript, Go, whatever — keeps working exactly the same. llmux intercepts the calls and adds superpowers. Quick Start git clone https://github

I Put a 5MB Rust Binary Between My Code and Every LLM API — It Cut My Bill by 40%

Related Articles

The Concurrency Bug Most Go Developers Ship to Production

fosdemflix

Akhuwat loans foundation bank 2026

Ding-dong! The Exploration Upper Stage is dead

My first Medium article

Related Articles

News
The Concurrency Bug Most Go Developers Ship to Production
Medium Programming • 31m ago

News
fosdemflix
Lobsters • 1h ago

News
Akhuwat loans foundation bank 2026
Medium Programming • 1h ago

News
Ding-dong! The Exploration Upper Stage is dead
Ars Technica • 1h ago

News
My first Medium article
Medium Programming • 2h ago