What is robots.txt?
A robots.txt file tells search engines and AI crawlers which pages on your site they can access. It’s a simple text file placed in your website’s root directory that controls how bots interact with your content.
In 2025, robots.txt has become crucial for AI visibility. With the rise of AI search engines like Perplexity, ChatGPT with browsing, and Google AI Overviews, controlling which AI crawlers can access your content directly impacts whether your brand appears in AI-generated responses.
Why AI Crawler Settings Matter
AI crawlers fall into two categories:
Citation Crawlers (like PerplexityBot, OAI-SearchBot) fetch your content to cite in AI search results. Allowing these can drive traffic to your site when users click citations.
Training Crawlers (like GPTBot, ClaudeBot) collect data to train AI models. Blocking these protects your content while still maintaining search visibility.
Our generator helps you make informed decisions about which crawlers to allow based on your business goals.
AI Crawler Reference Guide
| Crawler | Company | Purpose | Recommendation |
|---|---|---|---|
| Googlebot | Search indexing | Always Allow | |
| Bingbot | Microsoft | Search indexing | Always Allow |
| PerplexityBot | Perplexity | AI search citations | Allow for visibility |
| OAI-SearchBot | OpenAI | ChatGPT search | Allow for visibility |
| GPTBot | OpenAI | Model training | Your choice |
| ClaudeBot | Anthropic | Model training | Your choice |
| Google-Extended | Gemini training | Your choice | |
| CCBot | Common Crawl | Open dataset | Your choice |
Platform-Specific Guidelines
WordPress
Block /wp-admin/, /wp-includes/, plugin directories, feeds, and search results to prevent crawling of admin areas and duplicate content.
Shopify
Block /admin/, /cart/, /checkout/, and filtered collection URLs to protect user sessions and avoid indexing duplicate product pages.
eCommerce (General)
Block cart, checkout, account pages, and filter/sort URL parameters to focus crawler attention on product and category pages.
Frequently Asked Questions
Where do I upload my robots.txt file?
Upload it to your website’s root directory so it’s accessible at https://yourdomain.com/robots.txt. Most hosting providers have a file manager, or you can use FTP.
Will blocking AI crawlers hurt my SEO?
No. Traditional search engines (Googlebot, Bingbot) are separate from AI training crawlers. You can block AI training while maintaining full search visibility.
What’s the difference between robots.txt and llms.txt?
robots.txt controls crawler access (blocking/allowing). llms.txt is a newer proposed standard that helps AI models understand your site structure—it’s informational rather than restrictive.
Which visibility strategy should I choose?
It depends on your goals:
- Maximum Visibility: Best for most businesses. Appear in both traditional search and AI responses.
- AI Visibility Only: Focus exclusively on AI search engines like ChatGPT, Perplexity, and Claude.
- Traditional SEO Only: Block all AI crawlers for maximum content protection.
How often should I update my robots.txt?
Review it whenever you add new sections to your site, change platforms, or want to adjust your AI visibility strategy. The AI crawler landscape evolves, so annual reviews are recommended.
Can I have different settings for search engines and AI crawlers?
Yes! Our generator offers three strategies: Maximum Visibility allows all crawlers, AI Visibility Only focuses on AI crawlers while blocking traditional search, and Traditional SEO Only blocks all AI crawlers.
What is llms.txt?
llms.txt is a proposed standard by Answer.AI for providing AI models with a structured overview of your site. It uses Markdown format and helps AI systems understand your content hierarchy. While not yet widely adopted, early implementation may benefit future AI visibility.
Do AI companies respect robots.txt?
Major AI companies (OpenAI, Anthropic, Google) have stated they respect robots.txt directives. However, compliance isn’t guaranteed for all crawlers, so robots.txt should be one part of your content protection strategy.