
How Komilion's Request Routing Actually Works
When I tell developers "it automatically picks the cheapest model," the first question is always: how? Here's the actual architecture. The core problem Every AI API call has a cost. The cost is determined by two things: which model handles it, and how many tokens are involved. Opus 4.6 costs ~15× more per token than Gemini Flash. For a commit message or "what does this function return?" — you're paying 15× too much. For a 500-line architectural review — Opus is the right tool. The routing problem is: classify each request quickly enough that the classification overhead doesn't eat your savings, then map it to the right model. Layer 1: Regex fast-path (<5ms) The first pass is a regex classifier that runs in a few milliseconds. It looks for explicit signals in the request: Simple patterns (routes to frugal tier): Requests under ~100 tokens with common question patterns Commit message / changelog requests Single-line completions "What does X do?" / "Explain this variable" patterns Documen
Continue reading on Dev.to
Opens in a new tab


