
Context-Anchored Generation (CAG): Fixing Hallucinations at the Decoding Layer
Hallucination isn’t an output problem. It’s a generation problem. The Problem Isn’t Knowledge — It’s Control Large language models don’t hallucinate because they “don’t know.” They hallucinate because generation drifts . At each step, the model predicts: P(tokenₜ | context) That context is constantly shifting. Over time, something subtle happens: The original prompt weakens Recent tokens dominate High-frequency patterns take over This creates what can be described as semantic drift . The model doesn’t suddenly “break.” It gradually leaves the frame . The Core Idea CAG introduces a simple constraint: Every token must stay semantically aligned with a persistent frame. Instead of letting generation run open-loop, we: Create a semantic anchor from the prompt Track how far each new token drifts Intervene during decoding , not after Two-State Decoding CAG operates as a control system with two modes: Constraint Mode Enforces alignment with the anchor Penalizes tokens that drift too far Keeps
Continue reading on Dev.to
Opens in a new tab




