
We Gave LLMs 150 Tools: Here's What Broke.
There's a hypothesis that most people building AI agents have encountered but few have measured: the more tools you give an LLM, the worse it gets at picking the right one. It's intuitive. Connect a few MCP servers to your agent, and suddenly it's choosing from 60, 80, 100+ tools. GitHub tools, GitLab tools, Kubernetes, Slack, Jira, PagerDuty, Terraform, Grafana, all loaded into the context window, all the time. The model has to read every tool definition, understand the distinctions between them, and pick the right one. That's a lot of signal to sift through. But intuition isn't data. So we built Boundary , an open-source framework for finding where LLM context breaks, and ran the numbers. The setup We assembled 150 tool definitions based on real schemas from production agent systems across 16 services: GitHub, GitLab, Jira, Confluence, Kubernetes, AWS, Datadog, Slack, PagerDuty, Okta, Snyk, Grafana, Terraform Cloud, Docker, Linear, and Notion. The tools are synthetic (no-op for bench
Continue reading on Dev.to
Opens in a new tab


