
Building ExpertLens: Real-time AI Coaching for Software You Control Directly
Disclaimer: This post was created for the purposes of entering the Gemini Live Agent Challenge. ExpertLens is a real-time voice and vision coaching agent for any software where the human must be the operator. Share your screen or point your camera, speak naturally, and get expert guidance for Blender, Affinity Photo, Unreal Engine, a mobile game, or any app an AI cannot run on your behalf. This post covers the core insight behind the project, how it's built on Gemini Live API, and four specific technical challenges that required non-obvious solutions. 1. The Human-Control Gap When deciding what to build for this hackathon, I mapped the landscape of AI assistance by tool type: Browser-based apps (Figma, Canva, Google Docs): Playwright and Selenium can automate these. LLM agents can literally control them by reading the DOM and clicking elements. AI coaching adds limited value when AI can just do the task. CLI and API-friendly tools (git, ffmpeg, AWS CLI): LLMs can call these directly vi
Continue reading on Dev.to
Opens in a new tab




