
How We Saved a High-Traffic IoT Service from 200 RPS to 20,000+ RPS (and a $42k+ AWS Bill)
In the world of high-concurrency systems, throwing more hardware at a problem is often the most expensive way to fail. Recently, I revisited some investigation logs and Go pprof profiles from a project I handled four years ago as a contractor for an Automobile IoT company. At the time, the company was managing telemetry for tens of thousands of connected vehicles. The service was struggling with massive CPU utilization and scaling issues. Despite being backed by a significant cloud budget, the infrastructure was buckling under a load that, on paper, should have been manageable. This is a story of how we moved from a state of "throwing money at the fire" to a lean, high-performance architecture. The Infrastructure Bottleneck: 27 Nodes for 200 RPS My first task at this company was to optimize our gateway server. Since I was a contractor and didn't have direct access yet, the Engineering Lead and I sat down to review the dashboard together. When he showed me the infrastructure, I was floo
Continue reading on Dev.to
Opens in a new tab




