Robots.txt Generator

Configure crawler access for search engines and AI bots, then download your robots.txt file.

Quick Presets

Crawler Access Control

User-agent: * (All crawlers)

Default rule for all unspecified bots

Search Engines

Googlebot

Google's primary web crawler

Bingbot

Microsoft Bing's web crawler

Yandexbot

Yandex search engine crawler

DuckDuckBot

DuckDuckGo's web crawler

Baiduspider

Baidu search engine crawler

Slurp

Yahoo's web crawler

AI Crawlers

GPTBot

OpenAI's web crawler for training data

ClaudeBot

Anthropic's web crawler

Google-Extended

Google's AI training crawler

CCBot

Common Crawl bot used for AI training

PerplexityBot

Perplexity AI's web crawler

Bytespider

ByteDance/TikTok's web crawler

Amazonbot

Amazon's web crawler for Alexa

Social & Other

Applebot

Apple's web crawler for Siri and Spotlight

Twitterbot

Twitter/X link preview crawler

facebot

Facebook's link preview crawler

LinkedInBot

LinkedIn's link preview crawler

Additional Settings

Sitemap URL

Helps search engines discover all your pages

Crawl Delay (seconds)

Minimum seconds between requests. Google ignores this; Bing and Yandex respect it.

Disallowed Paths

/admin//private/

Generated robots.txt

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/

Understanding robots.txt

The robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that tells web crawlers which parts of your site they can and cannot access. It follows the Robots Exclusion Protocol, a standard used since 1994. Every major search engine checks for this file before crawling your site.

The file uses a simple syntax: User-agent lines specify which crawler the rules apply to, while Allow and Disallow directives control access to specific paths. A wildcard User-agent (*) applies rules to all bots that do not have their own specific block. You can also include Sitemap directives to point crawlers to your XML sitemaps, and Crawl-delay to throttle request frequency.

Managing AI Crawler Access

With the rise of large language models, many website owners want to control whether AI companies can use their content for training purposes. Bots like GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), and Google-Extended (Gemini training) can be individually blocked or allowed in your robots.txt file.

Blocking AI crawlers does not affect your search engine visibility. You can freely disallow GPTBot and CCBot while keeping Googlebot and Bingbot fully allowed. This generator makes it easy to toggle access for each AI crawler individually, or use the Block AI Bots preset to disable all AI crawlers at once while maintaining full search engine access.

Common robots.txt Mistakes

One frequent mistake is using robots.txt to try to hide pages from search results. Blocking a URL in robots.txt prevents crawling but does not prevent indexing. If other pages link to a blocked URL, Google may still show it in results with limited information. Use noindex meta tags or X-Robots-Tag headers to truly remove pages from search results.

Another common error is accidentally blocking CSS, JavaScript, or image files that search engines need to render your pages. If Googlebot cannot load your stylesheets or scripts, it may not understand your page layout, which can hurt rankings. Always test your robots.txt with Google Search Console after making changes, and be careful with wildcard patterns that might match more paths than intended.

Frequently Asked Questions

What is robots.txt?

robots.txt is a plain text file placed at the root of your website that instructs web crawlers which pages or sections they are allowed or not allowed to access. It follows the Robots Exclusion Protocol and is checked by all major search engines before crawling your site.

Does robots.txt block indexing?

No. Blocking a URL in robots.txt prevents crawlers from visiting the page, but it does not prevent indexing. If other sites link to a blocked page, search engines may still show the URL in results with a limited snippet. To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header.

Should I block AI crawlers?

That depends on your goals. If you want to prevent AI companies from using your content for model training, you can block bots like GPTBot, ClaudeBot, CCBot, and Google-Extended. These blocks will not affect your regular search engine rankings as long as Googlebot and Bingbot remain allowed.

Where do I place the robots.txt file?

The robots.txt file must be placed at the root of your domain, accessible at https://yourdomain.com/robots.txt. It must be served with a 200 status code and a text/plain content type. Each subdomain needs its own separate robots.txt file.

What is Crawl-delay?

Crawl-delay is a directive that tells crawlers to wait a specified number of seconds between requests. This can reduce server load from aggressive crawlers. Google does not honor Crawl-delay (use Google Search Console instead), but Bing, Yandex, and several other crawlers respect it.

Understanding robots.txt

Managing AI Crawler Access

Common robots.txt Mistakes

Frequently Asked Questions

Robots.txt Generator

Quick Presets

Crawler Access Control

Additional Settings

Generated robots.txt

Understanding robots.txt

Managing AI Crawler Access

Common robots.txt Mistakes

Frequently Asked Questions

What is robots.txt?

Does robots.txt block indexing?

Should I block AI crawlers?

Where do I place the robots.txt file?

What is Crawl-delay?

Related Tools

Robots.txt Generator

Quick Presets

Crawler Access Control

Additional Settings

Generated robots.txt

Understanding robots.txt

Managing AI Crawler Access

Common robots.txt Mistakes

Frequently Asked Questions

What is robots.txt?

Does robots.txt block indexing?

Should I block AI crawlers?

Where do I place the robots.txt file?

What is Crawl-delay?

Related Tools