
HTML vs Markdown vs SOM: Which Format Should Your AI Agent Use?
Every AI agent that browses the web faces the same question: how do you represent a web page to a language model? The default answer, raw HTML, is expensive and slow. A typical page dumps 30,000+ tokens into your context window, most of it CSS classes and layout divs. But what are the actual alternatives? And do they work? We ran WebTaskBench, 100 tasks across GPT-4o and Claude Sonnet 4, to find out. The results surprised us. The Three Representations When an agent needs to understand a web page, there are three common approaches: 1. Raw HTML The DOM as-is. Every <div> , every class="sc-1234 flex items-center gap-2" , every inline script. This is what most agents send today. <div class= "sc-1234 flex items-center gap-2 px-4 py-2" > <a href= "/about" class= "text-blue-500 hover:underline font-medium tracking-tight text-sm" > About </a> <span class= "text-gray-400" > | </span> <a href= "/pricing" class= "text-blue-500 hover:underline font-medium tracking-tight text-sm" > Pricing </a> </d
Continue reading on Dev.to
Opens in a new tab



