The Prompt Change That Broke Production at 2am

Why This Keeps Happening When you test traditional software, you test a deterministic function. Same input, same output. If the output changes, something broke, the test fails, you investigate. LLM agents are not deterministic functions. They’re probabilistic systems with behavioral contracts. The contract isn’t “return exactly this string.” The contract is: given this class of inputs, the output must satisfy these structural and semantic properties. The word “liability” should appear. The summary should be in bullet points. The termination clause should be mentioned. These are the invariants your downstream systems depend on — and they’re completely untested in most production LLM pipelines. The industry’s response to this has been more evals. MMLU benchmarks, human preference ratings, red-team suites. Valuable for model builders. Useless for application developers who need to know whether their specific prompts still produce outputs their specific systems can rely on. You’re not tryi

The Prompt Change That Broke Production at 2am

Related Articles

References: The Alias You Didn’t Know You Needed

Pointers: The Concept Everyone Says Is Hard

Learning a Recurrent Visual Representation for Image Caption Generation

# 5 JSON Mistakes Developers Make (And How to Fix Them Fast)

10 subtle go mistakes that only show up in production

Related Articles

How-To
References: The Alias You Didn’t Know You Needed
Medium Programming • 20h ago

How-To
Pointers: The Concept Everyone Says Is Hard
Medium Programming • 20h ago

How-To
Learning a Recurrent Visual Representation for Image Caption Generation
Dev.to • 21h ago

How-To
# 5 JSON Mistakes Developers Make (And How to Fix Them Fast)
Medium Programming • 23h ago

How-To
10 subtle go mistakes that only show up in production
Medium Programming • 23h ago