I Built an AI That Sees Your Screen and Speaks Your Answers, Here's How

I Built an AI That Sees Your Screen and Speaks Your Answers — Here's How This post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge The Problem With Typing Every day we spend hours switching between tabs, typing search queries, copying text, and manually reading through pages trying to find answers. What if you could just look at your screen and ask a question out loud — and get an answer spoken back to you instantly? That's exactly what I built. Voice UI Navigator is an AI agent that: 👁️ Sees your browser screen using Gemini multimodal vision 🎙️ Listens to your voice via the Gemini Live API 🔍 Searches Google in real time to research answers 🔊 Speaks results back to you naturally No typing. No DOM access. No browser extensions. Just pure visual AI understanding — the same way a human would look at a screen. Live demo: https://voice-navigator-913580598688.us-central1.run.app GitHub: https://github.com/Kamaumbugua-dev/GEMINI_CO

I Built an AI That Sees Your Screen and Speaks Your Answers, Here's How

Related Articles

What You Need to Know About Building an Outdoor Sauna (2026)

The Boring Skills That Make Developers Unstoppable in 2026

I Installed This VS Code Extension… and My Code Got Instantly Better

The Age of Personalized Software

Automating Checkout Add-On Recommendations in WordPress for WooCommerce

Related Articles

How-To
What You Need to Know About Building an Outdoor Sauna (2026)
Wired • 5h ago

How-To
The Boring Skills That Make Developers Unstoppable in 2026
Medium Programming • 10h ago

How-To
I Installed This VS Code Extension… and My Code Got Instantly Better
Medium Programming • 11h ago

How-To
The Age of Personalized Software
Medium Programming • 13h ago

How-To
Automating Checkout Add-On Recommendations in WordPress for WooCommerce
Dev.to • 13h ago