
Stop paying OpenAI to transcribe your voice notes (My offline Telegram bot stack) 🎙️
Every tutorial on building an AI Telegram bot right now uses the exact same lazy architecture: User sends a voice message. Bot downloads the .ogg file. Bot sends the file to OpenAI's Whisper API. You get billed per minute of audio. This is fine if you are building a quick prototype. But if you actually use your bot every single day, you are burning money on a task your own CPU can do for free. Not to mention the privacy nightmare of shipping all your personal audio logs to a third-party cloud. The local alternative ⚙️ I wanted to build a Telegram interface for the Nomi API. I heavily rely on voice messages, so I needed speech-to-text. Instead of defaulting to a paid API, I built the entire transcription pipeline locally using Vosk and FFmpeg. The workflow is dead simple: Telegram sends the .ogg voice note. FFmpeg runs a local process to convert it to a .wav file with the correct sample rate. The offline Vosk model reads the file and returns the text. Then the text is sent to the LLM. c
Continue reading on Dev.to
Opens in a new tab



