Back to articles
Building real-time Bluesky analytics: ingesting 2.2M posts/day from the firehose

Building real-time Bluesky analytics: ingesting 2.2M posts/day from the firehose

via Dev.to PythonArielle Houlier

Bluesky publishes every post, like, follow, and block through a public firehose — a WebSocket stream of every event on the network in real-time. I built a system that ingests all of it, classifies every post with AI, and turns it into analytics anyone can use. Here's how it works and what I learned processing ~2.2 million posts per day on a single server. The Architecture The stack is straightforward: Python 3.11, FastAPI, PostgreSQL 16, and Redis 7, all running on a single Hetzner CPX52 (~$50/month). Docker Compose orchestrates 13+ services. The firehose consumer connects to Bluesky's relay via WebSocket and receives every event on the network. At peak hours, that's 130K+ posts per hour. The consumer writes raw post data (text, author DID, timestamps) to PostgreSQL, where an enricher service resolves author handles and profile metadata in batches. AI Classification at Scale The interesting part is the content intelligence pipeline. Every ingested post gets sampled and sent to Claude H

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
2 views

Related Articles