
Why Agent Testing is Broken
Why Agent Testing Is Broken And what to do about it. Software testing has been solved for decades. You write a function, you assert its output, your CI turns green, you ship. The contract is clear: same input, same output, always. LLM agents broke this contract completely — and most teams haven’t noticed yet. The Problem Nobody’s Talking About Ask your agent “summarize this contract” today and get a good response. Ask it again tomorrow after a model update, a prompt tweak, or a context window change, and get something subtly different. Not wrong, exactly. Just… different. Different enough that the downstream system parsing it breaks silently at 2am. This is not a hypothetical. It’s happening in production right now at companies that thought they were shipping stable systems. The failure mode is insidious because: It doesn’t throw exceptions. The agent responds. It always responds. The response is even plausible. The failure is semantic, not syntactic. It’s not reproducible on demand. Y
Continue reading on Dev.to
Opens in a new tab



