Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference

Hey HN — I’m Veer and my cofounder is Suryaa. We're building Cumulus Labs (YC W26), and we're releasing our latest product IonRouter ( https://ionrouter.io/ ), an inference API for open-source and fine tuned models. You swap in our base URL, keep your existing OpenAI client code, and get access to any model (open source or finetuned to you) running on our own inference engine. The problem we kept running into: every inference provider is either fast-but-expensive (Together, Fireworks — you pay for always-on GPUs) or cheap-but-DIY (Modal, RunPod — you configure vLLM yourself and deal with slow cold starts). Neither felt right for teams that just want to ship. Suryaa spent years building GPU orchestration infrastructure at TensorDock and production systems at Palantir. I led ML infrastructure and Linux kernel development for Space Force and NASA contracts where the stack had to actually work under pressure. When we started building AI products ourselves, we kept hitting the same wall: GP

Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference

Related Articles

What You Need to Know About Building an Outdoor Sauna (2026)

The Boring Skills That Make Developers Unstoppable in 2026

I Installed This VS Code Extension… and My Code Got Instantly Better

The Age of Personalized Software

Automating Checkout Add-On Recommendations in WordPress for WooCommerce

Related Articles

How-To
What You Need to Know About Building an Outdoor Sauna (2026)
Wired • 19h ago

How-To
The Boring Skills That Make Developers Unstoppable in 2026
Medium Programming • 23h ago

How-To
I Installed This VS Code Extension… and My Code Got Instantly Better
Medium Programming • 1d ago

How-To
The Age of Personalized Software
Medium Programming • 1d ago

How-To
Automating Checkout Add-On Recommendations in WordPress for WooCommerce
Dev.to • 1d ago