
You Already Have a Speech Server. Your iPhone Keyboard Should Use It.
Someone posted on our GitHub Discussions this week. They'd been running a speech-to-text container on their homelab for months. Found Diction — an open-source iOS voice keyboard. Pointed the app at their server. Got a server error. The settings screen even said "endpoint reachable." Here's what was going wrong, and how two lines of config fixes it. Why direct connection fails Diction doesn't talk directly to speech servers. It connects through a lightweight gateway first. The reason is WebSockets. When you tap the mic, the app opens a WebSocket and streams raw PCM audio to the gateway in real time as you speak. When you're done, the gateway POSTs the full audio to your speech server, gets the transcript, and sends it back. The whole exchange happens in the time it takes to stop speaking. Without this, the alternative is: record the whole thing, send a file, wait. You'd feel every pause. The WebSocket is what makes it feel instant. The "endpoint reachable" check passes because the iOS a
Continue reading on Dev.to
Opens in a new tab




