
Xoul - Building a Local AI Agent Platform with Small LLMs: The Walls of Tool Calling and Practical Solutions
This post is a real-world account of developing Xoul, an on-premise Local AI agent platform, where we hit the walls of small LLM Tool Calling limitations and overcame them one by one at the application layer. Background: "Let's Build a Local Agent" With large models like GPT or Claude, Tool Calling is near-perfect. But the moment you need to run small local LLMs (Ollama + Llama3/Qwen/Oss under 20B) for on-premise environments or cost reasons, reality hits hard. Xoul is a personal AI agent platform with this basic flow: User input ↓ LLM (local[small] or commercial) ↓ Tool Call (JSON) Tool Router → Function execution ↓ Result fed back to LLM → Final response Running 30+ tools on this architecture — workflow management, scheduling, Python code execution — we hit three major problems. Limitation 1: The LLM Corrupts Parameters The Problem User: "Run the 'Organize My Coin When +-20%' workflow" The LLM needs to call run_workflow . What we actually got: { "tool" : "run_workflow" , "args" : { "
Continue reading on Dev.to
Opens in a new tab




