Building a Voice-Controlled Web Agent for the Gemini Hackathon (And How I Beat the API Rate Limits)
I created this piece of content for the purposes of entering the Gemini Live Agent Challenge hackathon. It’s currently 1:00 AM in Dhaka. My terminal is a wall of green and red logs, my coffee is cold, and I am about to submit my project for the Google Gemini Live Agent Challenge. Over the last few days, I’ve been building IAN (Intelligent Accessibility Navigator). It’s a multimodal AI Agent designed to browse the internet for you using just your voice. If you are breaking into tech, or if you are one of the hackathon judges reading this, I want to take you behind the scenes of how I built this, the late-night architecture pivots, and how I managed to stop my headless browsers from crashing my server. The Broken Web: Why We Need a New Approach to Web Accessibility ♿ If you have ever tried using a traditional screen reader on a modern e-commerce site, you know it’s a nightmare. Traditional screen readers rely entirely on parsing the Document Object Model (DOM). But today’s web is incredi
Continue reading on Dev.to
Opens in a new tab




