Building an AI fallback system: when to use GPT-4o, when to fall back to Haiku, when to skip the LLM entirely

Not every query deserves a frontier model. A user asking "what is your cancellation policy?" does not need GPT-4o to generate the answer. A rules engine or a simple database lookup handles it in 5 milliseconds at zero token cost. We learned this the hard way. Our first production deployment sent everything through GPT-4o. The quality was great. The bill was $7,200/month for a feature that should have cost $2,000. Worse, 60% of those queries were simple enough that a smaller model (or no model at all) would have produced identical output. This article covers the three-tier fallback system we built: a rules engine for deterministic queries, a cheap model (Claude Haiku) for simple generation, and a frontier model (GPT-4o) for complex reasoning. Stack: Node.js 20, TypeScript. The three tiers Here is the routing logic: Incoming query ↓ ┌─────────────────────┐ │ Tier 0: Rules │ → deterministic lookup, no LLM │ (FAQ, status, data)│ cost: $0, latency: <10ms └─────────┬───────────┘ ↓ not matche

Building an AI fallback system: when to use GPT-4o, when to fall back to Haiku, when to skip the LLM entirely

Related Articles

Web Color "Wheel" Chart

Im looking for indie apps and tools built by solo developers, their stories and perspectives for a newsletter I’m starting. If you know a solo maker or use an overlooked gem built by one please let me know! 🙏

Building a DIY OpenClaw

go-typedpipe: A Typed, Context-Aware Pipe for Go

What I've Learned Scaling Engineering Organisations

Related Articles

How-To
Web Color "Wheel" Chart
Dev.to • 3h ago

How-To
Im looking for indie apps and tools built by solo developers, their stories and perspectives for a newsletter I’m starting. If you know a solo maker or use an overlooked gem built by one please let me know! 🙏
Dev.to • 15h ago

How-To
Building a DIY OpenClaw
Lobsters • 17h ago

How-To
go-typedpipe: A Typed, Context-Aware Pipe for Go
Dev.to • 1d ago

How-To
What I've Learned Scaling Engineering Organisations
Dev.to • 1d ago