Same Instruction File, Same Score, Completely Different Failures
Two AI coding agents were given the same task with the same 10-rule instruction file. Both scored 70% adherence. Here's the breakdown: Rule Agent A Agent B camelCase variables PASS FAIL No any type FAIL PASS No console.log FAIL PASS Named exports only PASS FAIL Max 300 lines PASS FAIL Test files exist FAIL PASS Agent A had a type safety gap. It used any for request parameters even though it defined the correct types in its own types.ts file. Agent B had a structural discipline gap. It used snake_case for a variable, added a default export following Express conventions over the project rules, and generated a 338-line file by adding features beyond the task scope. Same score. Completely different engineering weaknesses. That table came from RuleProbe . About this case study The comparison uses simulated agent outputs with deliberate violations, not live agent runs. Raw JSON reports are in the repo under docs/case-study-data/ . This is documented in the case study . What RuleProbe is Rule
Continue reading on Dev.to
Opens in a new tab
