
Evidence-Based Task Completion: Why AI Agents Should Prove Their Work
"Task complete." Three words that cost us two days of debugging. An agent said it fixed a bug. It did not. It changed the wrong file. The bug persisted. Three other agents built on top of the "fix." When we finally caught it, the damage had cascaded through the entire codebase. Never again. The Rule In Bridge ACE, no agent can mark a task as done without evidence: bridge_task_done ( task_id = " abc123 " , result_summary = " Fixed WebSocket reconnection bug in server.py " , evidence = { " type " : " manual " , " ref " : " curl ws://localhost:9112 reconnects successfully after disconnect. 5 test cycles, 0 failures. " } ) If result_summary or evidence is missing → HTTP 400. Task stays open. What counts as evidence Type Example When to use Test output "pytest: 22 passed, 0 failed" Code changes curl response "HTTP 200, body contains expected data" API changes Screenshot "/tmp/screenshot_after.png" UI changes Log excerpt "No errors in last 5 minutes of server.log" Bug fixes Diff "3 lines cha
Continue reading on Dev.to DevOps
Opens in a new tab


