
How I Built OmniSence -A Multimodal AI That Streams Text, Images & Audio Together
Disclosure: I created this piece of content for the purposes of entering the Gemini Live Agent Challenge hackathon on Devpost. GeminiLiveAgentChallenge The Problem That Kept Me Up at Night Every AI tool I've used thinks in documents, not experiences. You get text here . An image there . Maybe audio if you switch tabs and use a different tool entirely. But a real creative director doesn't hand you a Word document — they paint a scene with words, sketches, and emotion simultaneously . That gap is what I built OmniSence to close. What OmniSence Does OmniSence is a Creative Director AI that takes a single idea — spoken or typed — and streams text, images, and audio together in real-time as one cohesive, interleaved experience. You speak: "A girl who discovers she can paint the future." OmniSence responds with: 📝 Narrative prose streaming word by word 🖼️ Watercolor illustrations appearing inline mid-sentence 🔊 Studio-quality narration reading the story back to you All at once. All live. No
Continue reading on Dev.to
Opens in a new tab



