Back to articles
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

via Dev.toplasmon

8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count If you run local LLMs, you probably assume "Q4 loses quality" and "Q8 is safe." More bits = better quality. Obvious. A 2025 ArXiv paper (Dong et al., arXiv:2508.16712) destroyed this assumption with measured data. 8-bit quantization killed 92% of HumanEval pass rate on a 13B model. The worst 4-bit degradation was 22%. 8-bit lost to 4-bit. Read in isolation, this makes no sense. But dig into the cause and you'll find the essential distinction that the word "quantization" conceals. The bit count wasn't the problem. What got quantized was. "Quantization" Is Not One Operation When the local LLM community says "quantization," two fundamentally different operations get conflated. Weight-only Quantization: Only model weight parameters converted from FP16 → low-bit Inference activations (intermediate computations) remain FP16 Examples: GGUF Q4_K_M, GPTQ, AWQ Weight-Activation Quantization: Both weights AND infer

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles