
Claude CLI vs API for Code Review: Same Model, Wildly Different Results
I stopped writing code by hand a while ago. Claude writes it, I review it, it ships. It works, so why should I? But here's the thing -- if AI writes all the code, who reviews it? Another AI, obviously. So I built brunt , an adversarial code review tool that throws LLMs at your diffs to find bugs and security issues. The problem is: which AI do you point it at? I have a Claude subscription (CLI access), and I have an API key. Same company, same models. Should give the same results, right? I also gave Ollama a try, didn't make the cut. I tested this against a real refactor on my Rust/Axum backend -- replacing four old subsystems with a new AI scenarios feature. 20 commits, 77 files, +1,566 / -5,900 lines. I ran brunt three ways: Claude CLI -- uses your Claude subscription via claude -p Anthropic API (Sonnet) -- claude-sonnet-4-6 via HTTP Anthropic API (Opus) -- claude-opus-4-6 via HTTP Same diff. Same tool. Same prompts. Wildly different results. The results Seven findings vs eighty-four
Continue reading on Dev.to Webdev
Opens in a new tab



