
Your robots.txt Won't Save You: What Actually Works Against AI Scrapers
AI bots now account for nearly 40% of all web traffic. If you think robots.txt is protecting your content, think again. The Problem: robots.txt Is Just a Suggestion Here's the uncomfortable truth: robots.txt is a voluntary protocol . Legitimate crawlers like Googlebot respect it. AI scrapers? Most don't. # Your robots.txt User-agent: GPTBot Disallow: / # Reality: GPTBot might respect this. # The other 200+ AI scrapers? Nope. I ran a honeypot experiment on my own sites. Within 48 hours: 73% of AI bot requests completely ignored robots.txt Bots spoofed legitimate User-Agent strings Some rotated IPs every few requests What Actually Works After weeks of testing, here's what moved the needle: 1. Rate Limiting by Behavior, Not User-Agent User-Agent strings are trivially spoofed. Instead, detect bot behavior: # Nginx: Rate limit aggressive crawlers limit_req_zone $binary_remote_addr zone=antibotzone:10m rate=10r/m ; location / { limit_req zone=antibotzone burst=5 nodelay ; } Real users don't
Continue reading on Dev.to Webdev
Opens in a new tab


