Robots.txt
A text file at /robots.txt that tells crawlers which paths they may or may not visit.
Definition
Robots.txt is a polite-protocol file. Compliant crawlers (Googlebot, Bingbot, etc.) read it and respect its rules; uncooperative or malicious crawlers ignore it. It's the right place to keep crawlers out of admin areas, search-result pages, and infinite-URL traps.
Robots.txt is not a security control. Anything you can't afford for a stranger to see needs authentication, not robots disallow. The file should also include a Sitemap: line pointing to your sitemap URL.
Example
A line like `Disallow: /admin/` tells well-behaved crawlers to skip your admin section. A `Sitemap: https://example.com/sitemap.xml` line at the bottom helps engines find your URL list.
Frequently asked questions
Will robots.txt hide my page from Google?
It blocks crawling, not indexing — Google can still list a URL it can't crawl. Use the noindex meta tag for that.
Should I block AI training crawlers?
Up to you. Some publishers add User-agent rules for known AI crawlers (GPTBot, ClaudeBot, Google-Extended). It's a policy choice, not a technical one.