
How We Route WhatsApp Messages to N Agents With a Single LLM Call
Most teams building AI-powered messaging systems make the same mistake: they run every inbound message through every agent. Got 5 agents? That's 5 LLM calls per message. Your users send "π" and you just burned $0.003 classifying a thumbs-up five times. We needed a better approach. Here's the pipeline we built β and why each layer exists. The Naive Approach (And Why It Hurts) The obvious architecture: Message arrives β Agent 1: "Is this for me?" (LLM call) β Agent 2: "Is this for me?" (LLM call) β Agent 3: "Is this for me?" (LLM call) β Agent 4: "Is this for me?" (LLM call) β Agent 5: "Is this for me?" (LLM call) 5 LLM calls. 5Γ the latency. 5Γ the cost. And 4 of them will say "nah, not for me." Now imagine your hotel WhatsApp gets 500 messages/day. That's 2,500 LLM calls just for routing. Most of them are "ok", "π", "thx", and emoji reactions. You're paying OpenAI to classify thumbs-ups. The Pipeline: Free Before Paid Our philosophy is simple: filter what you can for free, before you s
Continue reading on Dev.to
Opens in a new tab



