
LLM Router Benchmark: 46 Models, 8 Providers, Sub-1ms Routing
When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way. This is the technical story of how we benchmarked every model on our platform, discovered that speed and intelligence are poorly correlated, and built a production routing system that classifies requests in under 1ms using 14 weighted dimensions with sigmoid confidence calibration. The Problem: One Gateway, 46 Models, Infinite Wrong Choices BlockRun is an x402 micropayment gateway. Every LLM request flows through our proxy, gets authenticated via on-chain USDC payment, and is forwarded to the appropriate provider. The payment overhead adds 50-100ms to every request. Our users set model: "auto" and expect us to pick the right model. But "right" means different things for different requests: A "what is Python?" query should route to the cheapest, fastest model A "implement a B-tree with concurrent insertions" query n
Continue reading on Dev.to DevOps
Opens in a new tab



