I Built a Voice-First AI Photo & Document Editor with the Gemini Live API— Here's How

This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge There's a version of photo editing where you don't touch a single slider. You click the part of the image you want to change, say what you want out loud, and watch it happen. That's Say Edit. Over the past few weeks I built Say Edit — a voice-first AI workspace that lets you edit images and navigate documents entirely by speaking. It's powered by the Gemini Live API , Gemini image generation , and deployed on Google Cloud Run . This article is the behind-the-scenes of how I built it, what broke, and what surprised me. The Core Idea Most AI tools make you type. You open a chat window, describe what you want, wait for a response, copy it somewhere, repeat. I wanted to eliminate every one of those steps for two use cases I found genuinely painful: Editing a photo — you know exactly what you want to change, but you have to hunt through menus, masks, and sliders to get

I Built a Voice-First AI Photo & Document Editor with the Gemini Live API— Here's How

Related Articles

The Go Paradox: Why Go’s Simplicity Creates Complexity

The Cube That Taught Me to Code

Data quality testing: how Bruin and dbt take different paths to the same goal

A Funeral for the Coder

Monorepo vs. Polyrepo: How to Choose the Right Strategy for Managing Multiple Services

Related Articles

How-To
The Go Paradox: Why Go’s Simplicity Creates Complexity
Medium Programming • 5h ago

How-To
The Cube That Taught Me to Code
Medium Programming • 6h ago

How-To
Data quality testing: how Bruin and dbt take different paths to the same goal
Dev.to • 6h ago

How-To
A Funeral for the Coder
Dev.to • 7h ago

How-To
Monorepo vs. Polyrepo: How to Choose the Right Strategy for Managing Multiple Services
Medium Programming • 7h ago