Why QLoRA Produces a Gradient Norm Spike at Step 44 on Mistral-7B (and How to Fix It)

If you've been fine-tuning LLMs with QLoRA, you may have seen your gradient norm suddenly spike mid-training and wondered if something was broken. It probably wasn't — but it was a real problem, and it's fixable. While running a systematic comparison of LoRA, QLoRA, AdaLoRA, and VeRA on Mistral-7B, we noticed something odd: every QLoRA run produced a gradient norm spike at almost exactly step 44 . It was reproducible across seeds. We dug into it. The Observation Here's what we measured (same dataset, same seed, 5 runs each): QLoRA gradient norm at step 44 : ~15.28 Normal baseline (other steps in the same run) : ~1.0 Plain LoRA at step 44 : ~1.3 (no comparable spike) The spike isn't random noise — it's structural. It appears because the 4-bit quantization in QLoRA introduces quantization error into the backward pass that accumulates differently than in full-precision LoRA. By step 44 (during the early warmup phase), this accumulated error hits a threshold and produces an anomalous gradi

Why QLoRA Produces a Gradient Norm Spike at Step 44 on Mistral-7B (and How to Fix It)

Related Articles

Week 6 — No New Problems. Just Me and Everything I Already Learned.

What OpenClaw Gets Wrong Out of the Box (And How to Fix It)

Android Remote Compose：讓 Android UI 不用發版也能更新

Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?

“Learn to Code” Is Dead… Learn to Think Instead

Related Articles

How-To
Week 6 — No New Problems. Just Me and Everything I Already Learned.
Medium Programming • 3d ago

How-To
What OpenClaw Gets Wrong Out of the Box (And How to Fix It)
Medium Programming • 3d ago

How-To
Android Remote Compose：讓 Android UI 不用發版也能更新
Medium Programming • 3d ago

How-To
Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?
Lobsters • 3d ago

How-To
“Learn to Code” Is Dead… Learn to Think Instead
Medium Programming • 3d ago