robots.txt setting
Protect your website or application from AI crawlers by implementing a robots.txt file on your domain to direct AI bot operators on what content they can and cannot scrape for AI model training.
AI bots are expected to follow the robots.txt directives.
robots.txt files express your preferences. They do not prevent crawler operators from crawling your content at a technical level. Some crawler operators may disregard your robots.txt preferences and crawl your content regardless of what your robots.txt file says.
Cloudflare will independently check whether your website has an existing robots.txt file and update the behavior of this feature based on your website.
If your website already has a robots.txt file — verified by a HTTP 200 response — Cloudflare will prepend our managed robots.txt before your existing robots.txt, combining both into a single response.
For example, without this feature enabled, the robots.txt content of crawlstop.com would be:
User-agent: *Disallow: /lpDisallow: /feedbackDisallow: /langtest
Sitemap: https://www.crawlstop.com/sitemap.xmlWith the managed robots.txt enabled, Cloudflare will prepend our managed content before your original content, resulting in what you can view at https://www.crawlstop.com/robots.txt ↗.
# As a condition of accessing this website, you agree to abide by the# following content signals:
# (a) If a content-signal = yes, you may collect content for the# corresponding use.# (b) If a content-signal = no, you may not collect content for the# corresponding use.# (c) If the website operator does not include a content signal for a# corresponding use, the website operator neither grants nor restricts# permission via content signal with respect to the corresponding use.
# The content signals and their meanings are:
# search: building a search index and providing search results (e.g., returning# hyperlinks and short excerpts from your website's contents). Search# does not include providing AI-generated search summaries.# ai-input: inputting content into one or more AI models (e.g., retrieval# augmented generation, grounding, or other real-time taking of# content for generative AI search answers).# ai-train: training or fine-tuning AI models.
# ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
# BEGIN Cloudflare Managed content
User-Agent: *Content-signal: search=yes, ai-train=noAllow: /
User-agent: AmazonbotDisallow: /
User-agent: Applebot-ExtendedDisallow: /
User-agent: BytespiderDisallow: /
User-agent: CCBotDisallow: /
User-agent: ClaudeBotDisallow: /
User-agent: Google-ExtendedDisallow: /
User-agent: GPTBotDisallow: /
User-agent: meta-externalagentDisallow: /
# END Cloudflare Managed ContentUser-agent: *Disallow: /lpDisallow: /feedbackDisallow: /langtest
Sitemap: https://www.crawlstop.com/sitemap.xmlIf your website does not have a robots.txt file, Cloudflare creates a new file with our managed block directives and serves it for you.
To implement a robots.txt file on your domain:
- Log in to the Cloudflare dashboard ↗, and select your account and domain.
- Go to Security > Bots.
- Select Configure Bot Fight Mode.
- Turn Instruct bot traffic with robots.txt on.
-
In the Cloudflare dashboard, go to the Security Settings page.
Go to Settings -
Filter by Bot traffic.
-
Go to robots.txt setting.
-
Turn robots.txt setting on.
Free zones that do not have their own robots.txt file and do not use the managed robots.txt feature will display the Content Signals Policy when a crawler requests the robots.txt file for your zone.
This file only outlines the Content Signals framework. It does not express your preferences or rights associated with your content.
# As a condition of accessing this website, you agree to abide by the# following content signals:
# (a) If a content-signal = yes, you may collect content for the# corresponding use.# (b) If a content-signal = no, you may not collect content for the# corresponding use.# (c) If the website operator does not include a content signal for a# corresponding use, the website operator neither grants nor restricts# permission via content signal with respect to the corresponding use.
# The content signals and their meanings are:
# search: building a search index and providing search results (e.g., returning# hyperlinks and short excerpts from your website's contents). Search# does not include providing AI-generated search summaries.# ai-input: inputting content into one or more AI models (e.g., retrieval# augmented generation, grounding, or other real-time taking of# content for generative AI search answers).# ai-train: training or fine-tuning AI models.
# ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.Cloudflare's Content Signals Policy is included by default in the robots.txt file when you turn on robots.txt setting.
If you would like to opt out of displaying the policy in your robots.txt file, you can uncheck Display Content Signals Policy under Control AI Crawlers in your zone's overview.
Alternatively, you can use Security Settings.
Managed robots.txt for AI crawlers is available on all plans.
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark