I built a local screen reader that reads your screen aloud — no cloud, no API keys

I got tired of switching between reading and listening, so I built sttts — a local pipeline that watches any region of your screen, OCRs it, and speaks it aloud in real time. Everything runs on your own machine. Demo What it does 🖱️ You draw a rectangle on any part of your screen 📸 It snapshots that region every N seconds 🔍 Pixel diff check — skips frames where nothing changed 🧠 LightOnOCR-2-1B reads the text (runs on AMD GPU via ROCm) 🗣️ Kokoro-82M speaks it through your speakers (runs on CPU) 🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speaker The killer feature — auto page-turn You can draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC — it reads the entire book hands-free, turning pages automatically. # Draw OCR region, then draw the next-page button uv run python capture.py --next-btn -i 2 Models used OCR : LightOnOCR-2-1B — fast, accurate, runs on AMD GPU

I built a local screen reader that reads your screen aloud — no cloud, no API keys

Related Articles

The Context Window Tax: Why Agents Waste 40% of Tokens on Tool Metadata

EU's EES fingerprint and photo travel rules come into force

Alan Turing play in Cambridge MA

gap-cycle-system

Using Wireshark to reverse-engineer a USB device

Related Articles

News
The Context Window Tax: Why Agents Waste 40% of Tokens on Tool Metadata
Reddit Programming • 2h ago

News
EU's EES fingerprint and photo travel rules come into force
Lobsters • 3h ago

News
Alan Turing play in Cambridge MA
Martin Fowler • 3h ago

News
gap-cycle-system
Dev.to • 4h ago

News
Using Wireshark to reverse-engineer a USB device
Lobsters • 4h ago