Tools4 min readMarch 25, 2026
How to Validate robots.txt and Sitemap
The robots validator checks your robots.txt syntax, tests which crawlers can access which URLs, and validates your sitemap.xml structure.
How to Validate robots.txt and Sitemap
The robots.txt and sitemap validator checks two critical SEO configuration files that control how search engines crawl and index your site. Mistakes in either file can silently block Google from indexing your pages.
Accessing the Tool
Go to Tools → Robots Validator.
Validating robots.txt
What It Checks
- Syntax — valid User-agent directives, correctly formatted Allow/Disallow rules
- Common crawler rules — separate rule evaluation for Googlebot, Bingbot, and generic crawlers
- Sitemap directive — is a Sitemap: line present pointing to your sitemap?
- Crawl-delay — is a polite crawl delay specified?
- Accidental full block — detects Disallow: / applied to all user agents (blocks all crawlers)
URL Access Testing
Enter a URL to test whether specific crawlers can access it under your current robots.txt rules. For example, test whether Googlebot can crawl /admin/ — it should not.
Common Issues Detected
| Issue | Severity | Impact |
|---|---|---|
| Disallow: / for all crawlers | Critical | Entire site blocked from indexing |
| CSS/JS files blocked | High | Google cannot render your pages |
| No Sitemap directive | Medium | Sitemap must be submitted manually |
| Conflicting Allow/Disallow | Medium | Unpredictable crawler behavior |
| Missing robots.txt | Low | Crawlers assume full access |
Validating Sitemap.xml
What It Checks
- XML syntax — well-formed, no encoding errors
- URL count — maximum 50,000 URLs per sitemap file
- File size — maximum 50MB uncompressed
- Required elements — loc (URL) present for each entry
- URL format — HTTPS, no parameters, canonical URLs
- lastmod format — dates in W3C format (YYYY-MM-DD)
- Sitemap index — if you use a sitemap index, child sitemaps are validated
Common Issues Detected
| Issue | Severity |
|---|---|
| Malformed XML | Critical — sitemap not parseable |
| URLs returning 404 | High — wasted crawl budget |
| Non-HTTPS URLs in sitemap | Medium |
| Missing lastmod dates | Low |
| Including noindex pages | Medium — confusing signal to crawlers |
How to Use
- Enter your domain name or a direct URL to your robots.txt file.
- Click Validate.
- Review findings by severity.
- For URL testing, enter a specific URL in the test field and select which crawler to simulate.