Same Instruction File, Same Score, Completely Different Failures

Two AI coding agents were given the same task with the same 10-rule instruction file. Both scored 70% adherence. Here's the breakdown: Rule Agent A Agent B camelCase variables PASS FAIL No any type FAIL PASS No console.log FAIL PASS Named exports only PASS FAIL Max 300 lines PASS FAIL Test files exist FAIL PASS Agent A had a type safety gap. It used any for request parameters even though it defined the correct types in its own types.ts file. Agent B had a structural discipline gap. It used snake_case for a variable, added a default export following Express conventions over the project rules, and generated a 338-line file by adding features beyond the task scope. Same score. Completely different engineering weaknesses. That table came from RuleProbe . About this case study The comparison uses simulated agent outputs with deliberate violations, not live agent runs. Raw JSON reports are in the repo under docs/case-study-data/ . This is documented in the case study . What RuleProbe is Rule

Same Instruction File, Same Score, Completely Different Failures

Related Articles

Breaking In: A patch to finally unlock the best VCD player the SEGA Dreamcast -

clmystery: A command-line murder mystery

The Downfall and Enshittification of Microsoft in 2026

When not to use Event Sourcing?

A Cryptography Engineer’s Perspective on Quantum Computing Timelines

Related Articles

News
Breaking In: A patch to finally unlock the best VCD player the SEGA Dreamcast -
Reddit Programming • 4h ago

News
clmystery: A command-line murder mystery
Lobsters • 7h ago

News
The Downfall and Enshittification of Microsoft in 2026
Lobsters • 7h ago

News
When not to use Event Sourcing?
Reddit Programming • 9h ago

News
A Cryptography Engineer’s Perspective on Quantum Computing Timelines
Lobsters • 11h ago