I Built a Voice Interface for My AI Agent in 2 Hours (Flask + Web Speech API + TTS)

I had a free Saturday afternoon and a clear goal: talk to my AI agent out loud and hear it talk back. Two hours later, Atlas had a voice. Here's exactly how I built it — Flask backend, Web Speech API for input, Mistral's Voxtral TTS for output, and a canvas animation that makes the avatar's eyes glow in sync with the audio. The Stack Flask — tiny backend, two endpoints Web Speech API — browser-native speech-to-text (Chrome only, push-to-talk) Mistral Voxtral TTS — voxtral-mini-tts-2603 , returns base64 MP3 macOS say command — fallback when Voxtral is unavailable Web Audio API AnalyserNode — drives the canvas glow animation Architecture in 30 Seconds The flow is simple: User holds Space → Chrome's SpeechRecognition runs locally On final result, transcript POSTs to /api/chat Flask calls Mistral chat API (mistral-large-latest) → gets text response Flask calls Voxtral TTS → returns base64 MP3 Browser decodes the MP3, plays it through an AnalyserNode Canvas reads frequency data every frame

I Built a Voice Interface for My AI Agent in 2 Hours (Flask + Web Speech API + TTS)

Related Articles

Replace Doom Scrolling With Intentional Reading

Web Color "Wheel" Chart

Im looking for indie apps and tools built by solo developers, their stories and perspectives for a newsletter I’m starting. If you know a solo maker or use an overlooked gem built by one please let me know! 🙏

Building a DIY OpenClaw

go-typedpipe: A Typed, Context-Aware Pipe for Go

Related Articles

How-To
Replace Doom Scrolling With Intentional Reading
Dev.to • 23m ago

How-To
Web Color "Wheel" Chart
Dev.to • 4h ago

How-To
Im looking for indie apps and tools built by solo developers, their stories and perspectives for a newsletter I’m starting. If you know a solo maker or use an overlooked gem built by one please let me know! 🙏
Dev.to • 16h ago

How-To
Building a DIY OpenClaw
Lobsters • 18h ago

How-To
go-typedpipe: A Typed, Context-Aware Pipe for Go
Dev.to • 1d ago