All Tools
robots.txt Validator
Paste your robots.txt to validate syntax, check which crawlers are allowed or blocked, and catch common mistakes that could accidentally block Google from indexing your site.
What is robots.txt?
robots.txt is a plain text file at the root of your website (e.g. yoursite.com/robots.txt) that tells search engine crawlers which pages they should and shouldn't visit. It's the first file bots check when they arrive at your domain. Getting it wrong can accidentally de-index your entire site — or fail to protect private areas from crawlers.
Common robots.txt Mistakes
- — Disallow: / — this blocks ALL bots from your entire site. A common deployment accident.
- — No User-agent: * wildcard — bots with no specific rule may crawl uncontrolled.
- — Missing Sitemap directive — always declare your sitemap URL here for faster indexing.
- — Blocking /api/ routes that your frontend depends on — can break Google's rendering.
- — Using robots.txt to hide sensitive data — it's public! Use authentication instead.
robots.txt for Next.js / Vercel
In Next.js App Router, create a app/robots.ts file to generate your robots.txt dynamically:
import { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: '*',
allow: '/',
disallow: ['/admin/', '/api/'],
},
sitemap: 'https://yoursite.com/sitemap.xml',
}
}Popular Bot User-Agents
GooglebotGoogle's primary web crawler for Search
Googlebot-ImageGoogle image search crawler
BingbotMicrosoft Bing search crawler
GPTBotOpenAI's training data crawler
Claude-WebAnthropic's web crawler
CCBotCommon Crawl — used for AI training datasets