8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count If you run local LLMs, you probably assume "Q4 loses quality" and "Q8 is safe." More bits = better quality. Obvious. A 2025 ArXiv paper (Dong et al., arXiv:2508.16712) destroyed this assumption with measured data. 8-bit quantization killed 92% of HumanEval pass rate on a 13B model. The worst 4-bit degradation was 22%. 8-bit lost to 4-bit. Read in isolation, this makes no sense. But dig into the cause and you'll find the essential distinction that the word "quantization" conceals. The bit count wasn't the problem. What got quantized was. "Quantization" Is Not One Operation When the local LLM community says "quantization," two fundamentally different operations get conflated. Weight-only Quantization: Only model weight parameters converted from FP16 → low-bit Inference activations (intermediate computations) remain FP16 Examples: GGUF Q4_K_M, GPTQ, AWQ Weight-Activation Quantization: Both weights AND infer

8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

Related Articles

Building DNS query tool from scratch using C

How to build .NET obfuscator - Part I

How to Use Traceroute and MTR to Diagnose Network Issues

apt-key Deprecation: Add Repositories with GPG on Ubuntu

How To Use Variadic Functions in Go

Related Articles

How-To
Building DNS query tool from scratch using C
Reddit Programming • 1d ago

How-To
How to build .NET obfuscator - Part I
Reddit Programming • 1d ago

How-To
How to Use Traceroute and MTR to Diagnose Network Issues
DigitalOcean Tutorials • 1w ago

How-To
apt-key Deprecation: Add Repositories with GPG on Ubuntu
DigitalOcean Tutorials • 1w ago

How-To
How To Use Variadic Functions in Go
DigitalOcean Tutorials • 2w ago