
Why I Built a Filesystem for the Browser
Why I Built a Filesystem for the Browser Browser automation for AI agents has an impedance mismatch problem. We feed agents high-entropy noise — raw HTML, pixel screenshots, brittle CSS selectors — and expect them to produce low-entropy, structured actions. The result is fragile, expensive, and full of silent failure modes. DOMShell fixes this by exposing the browser's Accessibility Tree as a virtual filesystem. Agents navigate pages with ls , cd , grep , click — the same commands they already know from every Unix system in their training data. No new API to learn. No screenshots to parse. No selectors to break. The Impedance Mismatch Three approaches dominate browser automation today. All three are engineering mismatches. Screenshots + vision models. The agent takes a screenshot, sends it to a vision model, gets back pixel coordinates, clicks, takes another screenshot. This burns vision tokens on every action, adds a full round-trip per interaction, and fails silently when coordinates
Continue reading on Dev.to
Opens in a new tab

![[MM’s] Boot Notes — The Day Zero Blueprint — Test Smarter on Day One](/_next/image?url=https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1368%2F1*AvVpFzkFJBm-xns4niPLAA.png&w=1200&q=75)

