
My AI pipeline had a 1M token context window. The output still got worse.
Fixing a context window problem in an AIOps investigation pipeline The pipeline stitches context from three repos, calls Gemini with a chain-of-thought prompt, and posts root cause analysis to Slack and Jira. At some point output quality dropped. Diagnosis A character count diagnostic showed the actual repo sizes: frontend ~527k tokens backend ~311k tokens legacy ~7.9M tokens The fixed 50/35/15 budget split was loading the same proportion of irrelevant code regardless of ticket type. A scheduling bug got the same legacy allocation as an auth bug. Models don't attend uniformly across long contexts. Irrelevant content degrades output quality, it doesn't just take up space. The ceiling wasn't the constraint. Context selection was. Constraints to consider Model rate limits and context window. Already hitting the API directly so context caching is available, but the 1M token ceiling is hard. The fix had to work within it, not around it. Context quality vs. quantity. A smaller focused window
Continue reading on Dev.to
Opens in a new tab


