Show HN: Context Gateway – Compress agent context before it hits the LLM

We built an open-source proxy that sits between coding agents (Claude Code, OpenClaw, etc.) and the LLM, compressing tool outputs before they enter the context window. Demo: https://www.youtube.com/watch?v=-vFZ6MPrwjw#t=9s . Motivation: Agents are terrible at managing context. A single file read or grep can dump thousands of tokens into the window, most of it noise. This isn't just expensive — it actively degrades quality. Long-context benchmarks consistently show steep accuracy drops as context grows (OpenAI's GPT-5.4 eval goes from 97.2% at 32k to 36.6% at 1M https://openai.com/index/introducing-gpt-5-4/ ). Our solution uses small language models (SLMs): we look at model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, we compress it conditioned on the intent of the tool call—so if the agent called grep looking for error handling patterns, the SLM keeps the relevant matches and strips the rest. If the model later

Show HN: Context Gateway – Compress agent context before it hits the LLM

Related Articles

Plans to possibly retire the big-endian PowerPC/POWER platforms

Why Claude Code Gets Worse the Longer You Use It.

The Power of Small Steps

Stop Overpaying for Inference: The 1B Speech Model That Runs Locally and Outperforms 8B…

An ode to bzip

Related Articles

News
Plans to possibly retire the big-endian PowerPC/POWER platforms
Lobsters • 14m ago

News
Why Claude Code Gets Worse the Longer You Use It.
Medium Programming • 1h ago

News
The Power of Small Steps
Medium Programming • 2h ago

News
Stop Overpaying for Inference: The 1B Speech Model That Runs Locally and Outperforms 8B…
Medium Programming • 3h ago

News
An ode to bzip
Lobsters • 4h ago