
How I Stopped Writing Fragile E2E Tests and Let AI Handle It
Last month I spent 4 hours debugging a Playwright test that broke because someone renamed a CSS class. Sound familiar? I decided to try a different approach: what if the test framework could see the app like a human does, instead of relying on brittle selectors? What I Built An MCP (Model Context Protocol) server that gives AI agents — Claude, GPT, Cursor, Copilot — direct access to running applications. The AI can: Launch and connect to apps via CDP Tap elements, fill forms, scroll, navigate Take screenshots and analyze UI snapshots Run assertions in natural language The Key Trick: Semantic Snapshots Instead of sending full screenshots (expensive in tokens), I built a snapshot system that extracts the UI's semantic structure — interactive elements, their positions, labels, states. The AI gets a complete picture of the UI in ~2ms and a few hundred tokens. Compare that to a screenshot: ~100KB of base64, thousands of tokens, and the AI still has to "guess" where buttons are. Real Numbers
Continue reading on Dev.to Webdev
Opens in a new tab




