Back to articles
🧩 Runtime Snapshots #15 β€” Your AI Agent Is Blind. We're Fixing That.

🧩 Runtime Snapshots #15 β€” Your AI Agent Is Blind. We're Fixing That.

via Dev.to WebdevAlechko

Your AI agent can write code, analyze data, summarize documents, and debate philosophy. It cannot look at a web page. Not really. Not the way you do when you open a browser tab and see what's there β€” the layout, the buttons, the form that's half-loaded, the modal blocking the CTA. Claude, ChatGPT, Cursor, Gemini β€” they're powerful. And in the browser, they're blind. Three ways we've tried to give AI sight. All broken. Screenshots. The most common workaround. Take a screenshot, paste it into the chat. The AI "sees" pixels. But pixels have no element IDs, no computed styles, no z-index, no ARIA roles. The AI can't tell you which button is covered β€” just that something looks off. And vision tokens aren't cheap. Raw HTML. Dump the page source. 2MB of scripts, nav menus, analytics tags, third-party widgets. The context window fills up before the AI reads anything useful. The signal is buried under 600K tokens of noise. Accessibility trees. Better in theory. Structured, semantic. But AXTrees

Continue reading on Dev.to Webdev

Opens in a new tab

Read Full Article
2 views

Related Articles