
Your robots.txt Is Probably Wrong: A Guide to Crawl Directives
robots.txt is a plain text file at the root of your domain that tells search engine crawlers which URLs they can and cannot request. It's not a security mechanism (it's a suggestion, not a block), but it's a critical tool for managing how search engines interact with your site. The syntax User -agent: * Disallow: /admin/ Disallow: /api/ Allow : /api/public/ Sitemap: https://example.com/sitemap.xml User-agent : Which crawler the rules apply to. * means all crawlers. Specific agents include Googlebot , Bingbot , GPTBot , Bytespider . Disallow : Paths the crawler should not request. /admin/ blocks everything under /admin/. / blocks everything. Empty string blocks nothing. Allow : Overrides a Disallow for specific paths. Useful for allowing a subdirectory within a blocked directory. Sitemap : Points crawlers to your XML sitemap. Not all crawlers use this, but Google does. Common mistakes Blocking CSS and JavaScript. Disallow: /assets/ or Disallow: /*.css$ prevents Googlebot from rendering
Continue reading on Dev.to Tutorial
Opens in a new tab




