I gave an LLM 248 tools and accuracy dropped to 12%. Here's what fixed it.

So here's a fun one. I was building an LLM agent for a Kubernetes cluster — 248 API endpoints, all exposed as tools. Threw them all into the context and asked the model to "scale my deployment." Accuracy? 12%. The model basically choked on the wall of tool definitions. The vector search trap Natural first instinct: use vector search to retrieve only relevant tools. Embed all tool descriptions, find the closest matches, done. Except... when a user says "cancel my order and get a refund," vector search returns cancelOrder . But the actual workflow is: listOrders → getOrder → cancelOrder → processRefund Vector search gives you one tool. You need the chain . Graph-based retrieval I ended up building graph-tool-call — it models tool relationships as a directed graph. Tools have edges like PRECEDES, REQUIRES, COMPLEMENTARY. When you search, it doesn't just find one match — it traverses the graph and returns the whole workflow. The retrieval combines four signals via weighted Reciprocal Rank

I gave an LLM 248 tools and accuracy dropped to 12%. Here's what fixed it.

Related Articles

I Ran the Same C Code on Multiple Compilers… and Got Strange Results

The Inheritance Trap: How to Avoid Fragile Base Classes

Eighty Years Later, the Chemex Still Makes Better Coffee

The Day I Realized Coding Is Less About Computers and More About Learning How Humans Think

The Strange Advice Engineers Eventually Hear

Related Articles

How-To
I Ran the Same C Code on Multiple Compilers… and Got Strange Results
Medium Programming • 11h ago

How-To
The Inheritance Trap: How to Avoid Fragile Base Classes
Medium Programming • 12h ago

How-To
Eighty Years Later, the Chemex Still Makes Better Coffee
Wired • 13h ago

How-To
The Day I Realized Coding Is Less About Computers and More About Learning How Humans Think
Medium Programming • 13h ago

How-To
The Strange Advice Engineers Eventually Hear
Medium Programming • 17h ago