Recipe

The robots.txt Guide

Control what search engines crawl. A single file that speaks to every bot on the web.

What It Is

robots.txt lives at your domain root. Bots check it before crawling. It is a polite request — not a security boundary. Malicious scrapers ignore it. Well-behaved crawlers (Google, Bing) respect every directive.

Minimal Example

User-agent: *
Disallow: /admin
Disallow: /api
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Place this at public/robots.txt in your Next.js project. Vercel serves it from the root automatically.

Key Directives

User-agent — which bot the rules apply to. * means all bots.
Disallow — paths the bot must skip. Most specific rule wins.
Allow — carve out exceptions inside a disallowed directory.
Sitemap — points to your XML sitemap. Helps discovery.
Crawl-delay — seconds between requests. Supported by Bing, ignored by Google.

Common Patterns

Block Everything

User-agent: *
Disallow: /

Allow All

User-agent: *
Disallow:

Target Googlebot

User-agent: Googlebot
Disallow: /private

Block AI Crawlers

User-agent: GPTBot
Disallow: /

Testing

Use Google Search Console's robots.txt tester. Paste your URL and a path — it tells you whether the bot is allowed. Also check /robots.txt directly in your browser after deploy.

Remember: robots.txt is public. Anyone can view it. Never use it to hide sensitive endpoints — use authentication instead.