
I built a real-time AI screen co-pilot in 10 days using Gemini and Google Cloud:ππππ€
I built a real-time AI screen co-pilot in 10 days using Gemini and Google Cloud For the #GeminiLiveAgentChallenge, I wanted to break out of the standard text-chat paradigm. Over the last 10 days, I built OmniGuide: a multimodal screen co-pilot that actually "sees" what you are working on and helps you debug it live. But as Iβve written about before, you canβt just throw a giant prompt at a single LLM and expect it to survive production. To make OmniGuide fast and reliable, I implemented a strict Dual-Agent Architecture, mapping specific roles to the workflow to prevent context collapse. The Architecture: Scouts and Clerics Instead of a monolithic API call, the FastAPI backend acts as an orchestrator for two distinct agent roles: The Observer (The Scout): This agent is strictly responsible for ingestion. It takes base64 screen frames from the frontend, parses the visual data using Gemini's vision capabilities, and extracts a structured understanding of the UI state. The Guide (The Suppo
Continue reading on Dev.to
Opens in a new tab



