
What I Learned Testing 12 Compression Approaches That Failed
What I Learned Testing 12 Compression Approaches That Failed The most useful research I've done this year isn't in the NexusQuant paper. It's the experiments that failed, the ideas that sounded smart in theory and didn't survive contact with real KV cache data. Negative results build trust. They also save time — if you're working on KV cache compression, this list might save you weeks of effort. Each entry: what we tried, what we expected, what happened, what we learned. 1. PCA Rotation (3x Worse Distortion) The idea: Apply PCA to KV vectors to align the quantization axes with the data's principal components. This is optimal for Gaussian data — principal components diagonalize the covariance matrix, and uniform quantization along them minimizes MSE. What happened: 3x worse distortion than Hadamard rotation. PPL degradation jumped ~0.9 percentage points at the same compression ratio. Why: PCA is computed per-layer from calibration data. KV distributions shift by layer depth, head index,
Continue reading on Dev.to
Opens in a new tab



