Back to articles
I gave an LLM 248 tools and accuracy dropped to 12%. Here's what fixed it.

I gave an LLM 248 tools and accuracy dropped to 12%. Here's what fixed it.

via Dev.to PythonSon Seong Joon

So here's a fun one. I was building an LLM agent for a Kubernetes cluster — 248 API endpoints, all exposed as tools. Threw them all into the context and asked the model to "scale my deployment." Accuracy? 12%. The model basically choked on the wall of tool definitions. The vector search trap Natural first instinct: use vector search to retrieve only relevant tools. Embed all tool descriptions, find the closest matches, done. Except... when a user says "cancel my order and get a refund," vector search returns cancelOrder . But the actual workflow is: listOrders → getOrder → cancelOrder → processRefund Vector search gives you one tool. You need the chain . Graph-based retrieval I ended up building graph-tool-call — it models tool relationships as a directed graph. Tools have edges like PRECEDES, REQUIRES, COMPLEMENTARY. When you search, it doesn't just find one match — it traverses the graph and returns the whole workflow. The retrieval combines four signals via weighted Reciprocal Rank

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
4 views

Related Articles