How I Built Sally, A Voice-First Accessibility Agent Powered by Gemini

I built a desktop app that lets people control any website using only their voice. You talk, it takes a screenshot, sends it to Gemini 2.5 Flash, gets back a structured action, runs it in the browser, and repeats. The whole time it's narrating what it's doing out loud. Here's how it came together for the Gemini Live Agent Challenge. The Problem Picture this: you can't use a mouse. Maybe you can't use a keyboard either. You might have a repetitive strain injury, a motor impairment, or honestly you might just have a broken wrist. The web doesn't really care. It expects you to click tiny buttons, scroll precisely, type into fields, drag things around. There are screen readers and voice control tools out there, but they all seem to expect you to learn their language. Memorize commands. Know what things are called in the DOM. Fight with dictation software that mishears every other word. I wanted something where you could just say what you want: "Go to YouTube and search for lo-fi beats." No

How I Built Sally, A Voice-First Accessibility Agent Powered by Gemini

Related Articles

Clean Code Principles Every Software Engineer Should Follow

The Real Cost of Abstractions in .NET

Stop Learning Frameworks — You’re Wasting Your Time

How to Self-Host n8n in 2026: VPS vs Managed Hosting (Full Comparison)

I Built a Mac App to Fix Android File Transfer — Here’s What I Learned

Related Articles

How-To
Clean Code Principles Every Software Engineer Should Follow
Medium Programming • 1d ago

How-To
The Real Cost of Abstractions in .NET
Medium Programming • 1d ago

How-To
Stop Learning Frameworks — You’re Wasting Your Time
Medium Programming • 1d ago

How-To
How to Self-Host n8n in 2026: VPS vs Managed Hosting (Full Comparison)
Dev.to • 1d ago

How-To
I Built a Mac App to Fix Android File Transfer — Here’s What I Learned
Medium Programming • 1d ago