Recipe
The robots.txt Guide
Control what search engines crawl. A single file that speaks to every bot on the web.
What It Is
robots.txt lives at your domain root. Bots check it before crawling. It is a polite request — not a security boundary. Malicious scrapers ignore it. Well-behaved crawlers (Google, Bing) respect every directive.
Minimal Example
User-agent: *
Disallow: /admin
Disallow: /api
Allow: /
Sitemap: https://yoursite.com/sitemap.xmlPlace this at public/robots.txt in your Next.js project. Vercel serves it from the root automatically.
Key Directives
- User-agent — which bot the rules apply to. * means all bots.
- Disallow — paths the bot must skip. Most specific rule wins.
- Allow — carve out exceptions inside a disallowed directory.
- Sitemap — points to your XML sitemap. Helps discovery.
- Crawl-delay — seconds between requests. Supported by Bing, ignored by Google.
Common Patterns
Block Everything
User-agent: * Disallow: /
Allow All
User-agent: * Disallow:
Target Googlebot
User-agent: Googlebot Disallow: /private
Block AI Crawlers
User-agent: GPTBot Disallow: /
Testing
Use Google Search Console's robots.txt tester. Paste your URL and a path — it tells you whether the bot is allowed. Also check /robots.txt directly in your browser after deploy.
Remember: robots.txt is public. Anyone can view it. Never use it to hide sensitive endpoints — use authentication instead.