
We Built an Open Protocol So AI Agents Can Actually See Your Screen
You know what's wild? Every major AI lab is building "computer use" agents right now. Models that can look at your screen, understand what they see, and click buttons on your behalf. Anthropic has Claude Computer Use. OpenAI shipped CUA. Microsoft built UFO2. And every single one of them is independently solving the same problem: how do you describe a UI to an AI? We thought that was broken, so we built Computer Use Protocol (CUP) , an open specification that gives AI agents a universal way to perceive and interact with any desktop UI. One format. Every platform. MIT licensed. GitHub: computeruseprotocol/computeruseprotocol Website: computeruseprotocol.com The Problem: Six Platforms, Six Different Languages Here's the fragmentation that every computer-use agent has to deal with today: Platform Accessibility API Role Count IPC Mechanism Windows UIA (COM) ~40 ControlTypes COM macOS AXUIElement AXRole + AXSubrole XPC / Mach Linux AT-SPI2 ~100+ AtspiRole values D-Bus Web ARIA ~80 ARIA role
Continue reading on Dev.to
Opens in a new tab


