
AI Gateways Are Not I/O-Bound Proxies I Benchmarked 5 of Them to Prove It
The wrong mental model Most engineers think of AI gateways as thin reverse proxies. The mental model is something like nginx with proxy_pass accept a request, forward it to OpenAI, stream the response back. I/O-bound. The runtime barely matters. This model is wrong. Here is what actually happens on every request through an AI gateway: Parse the JSON body Validate the API key, Check rate limits Resolve the routing rule Select an upstream provider Mutate headers Forward the request Parse the streaming response Log the event Update usage meters Some gateways add policy evaluation, retry logic, or response transformation on top. None of that is I/O work. It is CPU work and it serializes under concurrent load. I built Ferro Labs AI Gateway , so I have a stake in this argument. That is also why I ran the benchmark: to understand exactly where different architectures break under pressure, including my own. I profiled five open-source AI gateways with flamegraphs and traced each failure mode t
Continue reading on Dev.to Python
Opens in a new tab



