![MiniMax 2.5 vs. GLM-5 across 3 Coding Tasks [Benchmark & Results]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D800%252Cheight%3D%252Cfit%3Dscale-down%252Cgravity%3Dauto%252Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Feq7o1m5anl1g2dx08jm4.jpeg&w=1200&q=75)
MiniMax 2.5 vs. GLM-5 across 3 Coding Tasks [Benchmark & Results]
GLM-5 and MiniMax M2.5 are two new open-weight models now available in Kilo Code. MiniMax M2.5 scores 80.2% and GLM-5 scores 77.8% on SWE-bench Verified, putting them very close to GPT-5.2 and Claude Opus 4.6 at a fraction of the cost. GLM-5 benchmark charts. MiniMax M2.5 benchmark charts. We ran both through three coding tasks in Kilo CLI , where they worked autonomously for up to 23 minutes at a time without human intervention. TL;DR: GLM-5 scored 90.5/100 with better architecture and testing. MiniMax M2.5 scored 88.5/100 with better instruction adherence and completed the tests in half the time (21 minutes vs 44 minutes). Test Design We created three TypeScript codebases testing different coding skills: Test 1: Bug Hunt (30 points) - Find and fix 8 bugs in a working Node.js/Hono task API. Bugs included race conditions, SQL injection, JWT vulnerabilities, pagination errors, and memory leaks. Test 2: Legacy Refactoring (35 points) - Modernize callback-based Express code to asyn
Continue reading on Dev.to
Opens in a new tab




