
HotSwap: Routing LLM Subtasks by Cache Economics
Abstract Model routing and prompt caching are well-established, separate techniques for reducing LLM API costs. Routing directs simple tasks to cheaper models (40-85% savings). Anthropic's prompt caching cuts input token costs by up to 90% on repeated prefixes. Every existing tool treats these as independent optimizations. This post proposes HotSwap , a pattern that keeps a persistent cached Claude session as the stateful backbone while offloading read-only exploration turns to a cheaper provider. The motivation is cache economics: cached turns on Anthropic are cheap, so you want to keep complex work there while routing lightweight exploration elsewhere. The mechanism is simpler than you'd expect -- task-type classification (exploration vs. action), not real-time cost calculation. And a self-tuning model selector adapts which cheap model handles exploration based on observed success rates in your specific workload. To be clear about what's novel and what isn't: multi-model routing exis
Continue reading on Dev.to
Opens in a new tab



