Back to articles
Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked

Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked

via Dev.to PythonWinson GR

Most FastAPI performance issues aren't caused by the framework - they're caused by architecture, blocking I/O, and database query patterns. I refactored a FastAPI backend that was stuck at ~180 requests/sec with p95 latency over 4 seconds. After a series of changes, it handled ~1300 requests/sec at under 200ms p95 - on the same hardware. No vertical scaling. No extra cloud spend. Just removing bottlenecks. The Starting Point The system had grown fast. Speed was prioritized over structure - until it wasn’t. By the time performance became a problem, the backend had 14+ microservices . In practice: Auth logic duplicated across 6 services Each service maintained its own DB connection pool A single request triggered 4–5 internal API hops Middleware inconsistently applied The latency wasn’t coming from slow code. It was coming from the architecture. Fix 1: Kill the Service Fragmentation 14+ repos → 4 domain-focused services: Before After auth, token, session identity-service report, export,

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
2 views

Related Articles