
making go speak real-time — our gemini live api websocket proxy
making Go speak real-time — our Gemini Live API WebSocket proxy The first time I got the audio proxy working, the cat meowed in Gemini's voice — a full 3 seconds of distorted PCM noise that sounded like a dial-up modem possessed by a cheerful robot. I'd set the sample rate wrong. 24kHz audio interpreted as 16kHz sounds like a cursed lullaby. I created this post for the purposes of entering the Gemini Live Agent Challenge. I'm building VibeCat . The core challenge was simple to state, hard to build: the macOS client can't talk to Gemini directly. Challenge rules require a backend, and you never put API keys on someone's Mac. So I needed a WebSocket proxy in Go that sits between the Swift client and Gemini Live API — receiving raw audio from one side, forwarding it to the other, and doing it fast enough that conversation feels natural. the architecture (deceptively simple) Swift Client ←→ [wss://gateway/ws/live] ←→ Go Gateway ←→ Gemini Live API PCM 16kHz mono → → PCM 16kHz ← PCM 24kHz ←
Continue reading on Dev.to
Opens in a new tab



