robots.txt Analyzer

Paste your robots.txt to validate rules, find SEO issues, and test URL paths.

robots.txt Content

Load template:

Paste your robots.txt content above or load a template to get started.

What Is robots.txt?

The robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that tells web crawlers which parts of your site they can and cannot access. It follows the Robots Exclusion Protocol, a standard that has been in use since 1994.

Every major search engine, including Google, Bing, and Yahoo, checks for a robots.txt file before crawling your site. While it is technically a suggestion (crawlers can choose to ignore it), all reputable search engine bots respect the directives. The file helps you control crawl budget, keep private sections hidden from search results, and guide crawlers to the most important content on your site.

robots.txt Syntax Guide

The robots.txt file uses a simple directive-based syntax. Each block starts with one or more User-agent lines followed by Allow and Disallow rules. The User-agent field specifies which crawler the rules apply to, with an asterisk (*) matching all bots.

Disallow tells a crawler not to access a specific path, while Allow overrides a Disallow for a more specific path. For example, you can Disallow: /private/ but Allow: /private/public-page. The Sitemap directive points crawlers to your XML sitemap. Crawl-delay sets the minimum time in seconds between requests, though Google ignores this directive. Comments start with a hash (#) and are ignored by crawlers. Wildcards (*) in paths match any sequence of characters, and the dollar sign ($) marks the end of a URL pattern.

Common robots.txt Mistakes

One of the most common mistakes is accidentally blocking CSS, JavaScript, or image files. Search engines need access to these resources to render your pages properly. If Googlebot cannot load your stylesheets or scripts, it may not understand your page layout, which can hurt your rankings.

Another frequent error is using robots.txt to hide pages from search results. Blocking a URL in robots.txt prevents crawling but does not prevent indexing. If other pages link to a blocked URL, Google may still show it in search results with limited information. To truly remove a page from search results, use a noindex meta tag or X-Robots-Tag header instead. Also be careful with trailing slashes and wildcards, as incorrect patterns can accidentally block or allow more paths than intended.

robots.txt and SEO Impact

Your robots.txt file directly affects your crawl budget, which is the number of pages a search engine will crawl on your site within a given timeframe. For large sites with thousands of pages, efficient use of robots.txt can ensure search engines spend their crawl budget on your most valuable pages rather than wasting it on duplicate content, filter pages, or internal search results.

A well-optimized robots.txt file blocks paths that generate low-value or duplicate content (like search result pages, session-based URLs, and admin areas) while keeping all important content accessible. Including your sitemap URL in robots.txt helps search engines discover your full sitemap quickly. For sites that want to block AI training crawlers while remaining visible in search, you can specifically target bots like GPTBot and CCBot with Disallow rules while keeping Googlebot and Bingbot fully allowed.

Frequently Asked Questions

How do I use this robots.txt analyzer?

Paste the contents of your robots.txt file into the text area, or load one of the provided templates. Click Analyze to see a full breakdown of user-agent blocks, allow/disallow rules, sitemap references, detected issues, and an SEO score. You can also use the URL Path Tester to check whether a specific path would be allowed or blocked for a given crawler.

What does the SEO score mean?

The SEO score (0 to 100) rates your robots.txt based on best practices. It starts at 100 and deducts points for issues like missing sitemap references, blocking important resource paths, syntax errors, duplicate rules, and overly restrictive configurations. A score above 80 indicates a well-configured file, while a lower score highlights areas that need improvement.

Does robots.txt prevent pages from appearing in Google?

No. Blocking a URL in robots.txt prevents Google from crawling it, but not from indexing it. If other sites link to a blocked page, Google may still show the URL in search results with a limited snippet. To actually remove a page from search results, use a noindex meta tag, an X-Robots-Tag HTTP header, or the URL Removal Tool in Google Search Console.

Should I block AI crawlers in robots.txt?

That depends on your goals. If you want to prevent AI companies from using your content for training data, you can add Disallow rules for bots like GPTBot (OpenAI), CCBot (Common Crawl), and Google-Extended (Gemini training). These blocks will not affect your regular search engine visibility as long as Googlebot and Bingbot remain allowed.

Is this tool free and does it store my data?

Yes, this tool is completely free. All processing happens directly in your browser. Your robots.txt content is never sent to any server or stored anywhere. You can safely analyze sensitive configuration files without privacy concerns.

Related Tools

Meta Tag Generator

Generate optimized meta titles and descriptions for better search engine visibility.

Schema Markup Builder

Build structured data markup to enhance your search result appearance with rich snippets.

DNS Lookup

Check DNS records for any domain including A, MX, TXT, NS, and CNAME records.