Robots.txt
When to Use
Configure robots.txt on every Drupal site before launch. It controls which paths search engines crawl. The default Drupal core
robots.txtcovers common admin paths — extend it for site-specific private areas, staging environments, and AI crawler policies.
Decision
| Situation | Choice | Why |
|---|---|---|
| Standard Drupal site | Edit core robots.txt directly | Simplest — it ships with sensible defaults |
| Drupal CMS with seo_tools recipe applied | Use robots.append.txt | Recipe manages the base; append.txt for custom rules |
| Staging / dev environment | Disallow all crawlers | Prevent indexing of non-production content |
| Need AI crawler-specific rules | robots.txt + per-agent Disallow | Training bots vs search bots need separate rules |
| Composer-managed project overwriting robots.txt | robots.txt via scaffolding config | Prevent composer from resetting your edits |
Pattern
Core robots.txt Location
Drupal ships robots.txt in the webroot. Its default rules:
User-agent: *
Crawl-delay: 10
# Drupal admin and system paths
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/logout
# Files directories
Disallow: /sites/*/files/private/
# Sitemap reference (add this)
Sitemap: https://example.com/sitemap.xml
drupal_cms_seo_tools — robots.append.txt
When using the drupal_cms_seo_tools recipe, custom rules go in robots.append.txt (project webroot). The recipe merges this file with core robots.txt during deployment:
# robots.append.txt — project-specific additions
# Block access to private staging areas
Disallow: /stage/
Disallow: /preview/
# AI crawler policies — see ai-crawler-policy.md for full list
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
Essential Rules to Always Include
# Admin UI — always block
Disallow: /admin/
Disallow: /node/*/edit
Disallow: /node/*/delete
# System paths
Disallow: /batch
Disallow: /cron/
Disallow: /update.php
Disallow: /install.php
# Private files
Disallow: /sites/*/files/private/
# Views exposed filters (prevent crawl traps)
Disallow: *?*
# Sitemap — always include
Sitemap: https://example.com/sitemap.xml
Staging Environment — Block Everything
# robots.txt for staging.example.com
User-agent: *
Disallow: /
Set this via environment-specific file or server configuration — do not rely solely on robots.txt as it is not enforced.
Protecting robots.txt from Composer Scaffold Overwriting
Add to composer.json to prevent Drupal scaffold from resetting your robots.txt:
"drupal-scaffold": {
"file-mapping": {
"[webroot-dir]/robots.txt": false
}
}
Common Mistakes
- Wrong: Disallowing
/sites/default/files/entirely → Right: Only disallow/sites/*/files/private/— public media should be crawlable - Wrong: Using robots.txt as a security mechanism → Right: Robots.txt is advisory only; use access control for real security
- Wrong: Forgetting the
Sitemap:directive → Right: Add it so crawlers auto-discover without Search Console submission - Wrong: Wildcard
Disallow: *?*on sites using query parameters for canonical content → Right: Only block faceted navigation query parameters, not all query strings - Wrong: Same robots.txt on production and staging → Right: Staging must disallow all crawlers to prevent duplicate content indexing
See Also
- AI Crawler Policy — full table of AI crawler user agents and recommended rules
- XML Sitemap — generating the sitemap to reference here
- Canonical URLs — duplicate content prevention at the URL level
- Reference: Google robots.txt specification
- Reference: Drupal core robots.txt