
When agent trace metrics lie: the span tree double-counting problem
I was building OpenInference support for an agent trace comparison tool when the token counts came back double what they should have been. The code was simple — sum tokens across all spans in a trace. The bug was that "all spans" included orchestration wrappers that carried their children's totals. Nothing crashed. The numbers just looked plausible enough to ship. This is the span tree double-counting problem. It's not hard to fix once you see it, but it's easy to miss because the wrong numbers look reasonable. The tree Agent traces are trees. This isn't a new data structure — OpenTelemetry has used tree-structured traces for distributed systems since long before LLMs were mainstream. OpenInference , the AI-specific semantic convention layer built on top of OpenTelemetry, inherits this model and adds span kinds tailored to AI workloads: LLM, TOOL, CHAIN, AGENT, RETRIEVER, and others. Every OpenInference trace is a valid OTLP trace — the conventions give attribute names their AI-specifi
Continue reading on Dev.to Python
Opens in a new tab




