
Code Mode: Batching MCP Tool Calls in a WASM Sandbox to Cut LLM Token Usage by 30-80%
The Problem: One Tool Call Per Turn Is Expensive If you've worked with LLMs and tool use, you know the pattern. The model decides it needs to call a tool. It emits a tool call. Your system executes it, returns the result. The model reads the result, reasons about it, and decides it needs another tool call. Repeat. Every round trip burns tokens. The model re-reads the entire conversation history each time. For workflows that touch 5-10 tools — think "look up the customer, check their subscription, fetch recent invoices, calculate usage, draft a summary" — you're paying for the same context window over and over. The token cost adds up fast, and latency compounds with each turn. The Solution: Let the LLM Write the Orchestration Code Mode flips the pattern. Instead of one tool call per LLM turn, the model writes a short JavaScript program that orchestrates multiple tool calls in a single execution. The model gets the results all at once and reasons over the complete picture. This is inspir
Continue reading on Dev.to
Opens in a new tab




