LLM Router Benchmark: 46 Models, 8 Providers, Sub-1ms Routing

When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way. This is the technical story of how we benchmarked every model on our platform, discovered that speed and intelligence are poorly correlated, and built a production routing system that classifies requests in under 1ms using 14 weighted dimensions with sigmoid confidence calibration. The Problem: One Gateway, 46 Models, Infinite Wrong Choices BlockRun is an x402 micropayment gateway. Every LLM request flows through our proxy, gets authenticated via on-chain USDC payment, and is forwarded to the appropriate provider. The payment overhead adds 50-100ms to every request. Our users set model: "auto" and expect us to pick the right model. But "right" means different things for different requests: A "what is Python?" query should route to the cheapest, fastest model A "implement a B-tree with concurrent insertions" query n

LLM Router Benchmark: 46 Models, 8 Providers, Sub-1ms Routing

Related Articles

I Studied What the Top 0.1%

Show HN: Red Grid Link – peer-to-peer team tracking over Bluetooth, no servers

Claude Code used 2.5M tokens on my project. I got it down to 425K with 6 hook scripts.

Hello, world!

A new Nintendo Switch 2 could be the poster child for replaceable batteries

Related Articles

How-To
I Studied What the Top 0.1%
Medium Programming • 7h ago

How-To
Show HN: Red Grid Link – peer-to-peer team tracking over Bluetooth, no servers
Hacker News • 7h ago

How-To
Claude Code used 2.5M tokens on my project. I got it down to 425K with 6 hook scripts.
Dev.to • 9h ago

How-To
Hello, world!
Dev.to • 9h ago

How-To
A new Nintendo Switch 2 could be the poster child for replaceable batteries
The Verge • 10h ago