Building Wand: A Voice + Hand Pointer Live Agent with Google ADK and Gemini Live

What if you could control your browser the way you'd direct a person — just point at something and say what you want? That question led us to build Wand , a live AI agent that lets you browse the web entirely through voice and hand gestures. No keyboard. No mouse. Point your finger at a YouTube thumbnail and say "play this" — it clicks. Point at a map and say "zoom in here" — it scrolls. Say "what is this?" — it takes a screenshot, annotates it with your cursor position, and tells you what you're pointing at. Here's how we built it. The Architecture: Cloud Agent, Local Browser The first design decision was where things live. The agent — the part that listens, reasons, and decides what to do — runs on Google Cloud Run , powered by Google ADK and Gemini 2.5 Flash Native Audio via the Gemini Live API. This gives us a stable, always-on backend that any client can connect to without needing API keys or local GPU resources. The browser, microphone, speaker, and webcam stay on the local machi

Building Wand: A Voice + Hand Pointer Live Agent with Google ADK and Gemini Live

Related Articles

Paramount+ just dropped to $2.99 a month - here's how to sign up

70+ Free Online Tools That Make Everyday Tasks Easier

I Tried to Build My First iOS Product — This Is What Happened

This unassuming amplifier is the one audio upgrade that finally made my speakers sing

Gas Surgery: Reducing Merkle Mixer Costs by 25% on Base

Related Articles

How-To
Paramount+ just dropped to $2.99 a month - here's how to sign up
ZDNet • 6h ago

How-To
70+ Free Online Tools That Make Everyday Tasks Easier
Medium Programming • 6h ago

How-To
I Tried to Build My First iOS Product — This Is What Happened
Medium Programming • 6h ago

How-To
This unassuming amplifier is the one audio upgrade that finally made my speakers sing
ZDNet • 8h ago

How-To
Gas Surgery: Reducing Merkle Mixer Costs by 25% on Base
Medium Programming • 9h ago