
I built a local screen reader that reads your screen aloud — no cloud, no API keys
I got tired of switching between reading and listening, so I built sttts — a local pipeline that watches any region of your screen, OCRs it, and speaks it aloud in real time. Everything runs on your own machine. Demo What it does 🖱️ You draw a rectangle on any part of your screen 📸 It snapshots that region every N seconds 🔍 Pixel diff check — skips frames where nothing changed 🧠 LightOnOCR-2-1B reads the text (runs on AMD GPU via ROCm) 🗣️ Kokoro-82M speaks it through your speakers (runs on CPU) 🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speaker The killer feature — auto page-turn You can draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC — it reads the entire book hands-free, turning pages automatically. # Draw OCR region, then draw the next-page button uv run python capture.py --next-btn -i 2 Models used OCR : LightOnOCR-2-1B — fast, accurate, runs on AMD GPU
Continue reading on Dev.to
Opens in a new tab


