How We Built a Voice AI That Takes Real DOM Actions

Voice is the new click. But most voice AI today? It's just chatbots with a microphone. The Chatbot Trap Traditional voice assistants follow a familiar pattern: listen → transcribe → think → respond. The problem? They're passive. They can tell you how to do something, but they can't actually do it. When a user says "book me a flight to New York," they don't want a list of booking sites. They want a confirmation email. Voice-First Architecture We built Anve differently. Instead of just generating text responses, our AI makes decisions about actual DOM actions it can take on the user's behalf. The flow looks like: Intent Recognition - What does the user want? Action Planning - What DOM operations achieve this? Execution - Actually click, type, and submit Confirmation - Verify the action succeeded The DOM Action Engine This is where it gets interesting. Our AI doesn't just see your website—it can interact with it. // The AI generates action sequences like: [ { type : ' click ' , selector :

How We Built a Voice AI That Takes Real DOM Actions

Related Articles

Week 6 — No New Problems. Just Me and Everything I Already Learned.

What OpenClaw Gets Wrong Out of the Box (And How to Fix It)

Android Remote Compose：讓 Android UI 不用發版也能更新

Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?

“Learn to Code” Is Dead… Learn to Think Instead

Related Articles

How-To
Week 6 — No New Problems. Just Me and Everything I Already Learned.
Medium Programming • 2d ago

How-To
What OpenClaw Gets Wrong Out of the Box (And How to Fix It)
Medium Programming • 2d ago

How-To
Android Remote Compose：讓 Android UI 不用發版也能更新
Medium Programming • 2d ago

How-To
Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?
Lobsters • 2d ago

How-To
“Learn to Code” Is Dead… Learn to Think Instead
Medium Programming • 3d ago