
Building OmniSight: A Real-Time AI Visual Companion Powered by Gemini Live and Google Cloud
I created this blog for detailing about my project in Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge The Problem: The World Doesn't Fit in a Text Box Most AI assistants are built around a simple loop: you type something, the AI responds. That works great for writing emails or answering trivia. But the real world doesn't fit in a text box. What if you're holding up a medical bill and want to know if you're being overcharged? What if you're in a foreign country and can't read the menu? What if you're visually impaired and need someone to describe what's in front of you? What if you're signing a lease and want to know if clause 14 is a red flag? These are real problems. And they all share one thing: they require an AI that can see . Google's Project Astra gave us a glimpse of what this could look like, a real-time visual agent that sees, understands, and remembers. But Astra was a prototype. We wanted to build the practical version: an agent equipped with specialized too
Continue reading on Dev.to Python
Opens in a new tab




