Glossary

Robots.txt

A text file at /robots.txt that tells crawlers which paths they may or may not visit.

Definition

Robots.txt is a polite-protocol file. Compliant crawlers (Googlebot, Bingbot, etc.) read it and respect its rules; uncooperative or malicious crawlers ignore it. It's the right place to keep crawlers out of admin areas, search-result pages, and infinite-URL traps.

Robots.txt is not a security control. Anything you can't afford for a stranger to see needs authentication, not robots disallow. The file should also include a Sitemap: line pointing to your sitemap URL.

Example

A line like `Disallow: /admin/` tells well-behaved crawlers to skip your admin section. A `Sitemap: https://example.com/sitemap.xml` line at the bottom helps engines find your URL list.

Frequently asked questions

Will robots.txt hide my page from Google?

It blocks crawling, not indexing — Google can still list a URL it can't crawl. Use the noindex meta tag for that.

Should I block AI training crawlers?

Up to you. Some publishers add User-agent rules for known AI crawlers (GPTBot, ClaudeBot, Google-Extended). It's a policy choice, not a technical one.

Robots.txt

Definition

Example

Frequently asked questions

Practical privacy, every Friday