Cloudflare has introduced its latest way to help website owners and publishers gain more control over their content. Cloudflare will make it easy for any website owner to update their robots.txt—the simple text file that tells web crawlers what parts of a site they can or cannot access—with a new Content Signals Policy. This new policy will enable website operators to express preferences over how their data is used by others, including the ability to opt out of AI overviews and inference.
The Internet is shifting from “search engines,” which provided a treasure map of links that a user could explore for information, to “answer engines” powered by AI, which give a direct answer without a user ever needing to click on the original site’s content. This severely threatens the original business model of the Internet, where websites, publishers, and content creators could earn money or fame by driving traffic and views to their site. Today, AI crawlers scrape vast troves of data from websites, but website operators have no way to express the nuances of whether, how, and for what purpose they may want to allow their content to be used. Robots.txt files allow website operators to specify which crawlers are allowed and what parts of a website they can access. It does not, however, let the crawler know what they are able to do with the content after accessing it. There needs to be a standard, machine-readable way to signal how data can be used even after it has been accessed.
“The Internet cannot wait for a solution, while in the meantime, creators’ original content is used for profit by other companies,” said Matthew Prince, co-founder and CEO of Cloudflare. “To ensure the web remains open and thriving, we’re giving website owners a better way to express how companies are allowed to use their content. Robots.txt is an underutilized resource that we can help strengthen, and make it clear to AI companies that they can no longer ignore a content creator’s preferences.”
Cloudflare believes that an operator of a website, API, MCP server, or any Internet-connected service, whether they are a local news organization, AI startup, or an ecommerce shop, should get to decide how their data is used by others for commercial purposes. Today, more than 3.8 million domains use Cloudflare’s managed robots.txt service to express they do not want their content used for training. Now, Cloudflare’s new Content Signals Policy will enable users to strengthen their robots.txt preferences with a clear set of instructions for anyone accessing the website via automated means, such as an AI crawler. The policy will now inform crawlers by:
- Explaining how to interpret the content signals in simple terms: “Yes” means allowed, “no” means not allowed, and no signal means no expressed preference.
- Defining the different ways that a crawler typically uses content in clear terms, including search, AI input, and AI training.
- Reminding companies that website operators’ preferences in robots.txt files can have legal significance.
While robots.txt files may not stop unwanted scraping, Cloudflare’s aim is that this improved policy language will better communicate a website owner’s preferences to bot operators, and drive companies to better respect content creator preferences.
Starting today, Cloudflare will automatically update the robots.txt files to include this new policy language for all customers that ask Cloudflare to manage their robots.txt file. For anyone who wants to declare how crawlers can use their content via customized robots.txt files, Cloudflare is publishing tools to help.
Organizations have seen the need for solutions like the Content Signals Policy, as a way to offer more direction over how their content is used:
- News/Media Alliance: “We are thrilled that Cloudflare is offering a powerful new tool, now widely available to all users, for publishers to dictate how and where their content is used. This is an important step towards empowering publishers of all sizes to reclaim control over their own content, and ensure they can continue to fund the creation of quality content that users rely on. We hope that this encourages tech companies to respect content creators’ preferences. Cloudflare is showing that doing the right thing isn’t just possible, it’s good business.” – Danielle Coffey, President and CEO of the News/Media Alliance
- Quora: “We applaud CloudFlare’s leadership and support their efforts in building controls and protocols to help publishers manage how their content is accessed.” – Ricky Arai-Lopez, Head of Product at Quora.
- Reddit: “For the web to remain a place for authentic human interaction, platforms that empower communities must be sustainable. We support initiatives that advocate for clear signals protecting against the abuse and misuse of content.” – Chris Slowe, CTO of Reddit
- RSL Collective: “We are excited to partner with Cloudflare on the launch of the Cloudflare Content Signals Policy, an essential step forward in allowing publishers to assert their rights and clearly define how companies may use their content. The open RSL standard, developed in cooperation with the Internet’s leading publishers, is designed to complement the Content Signals protocol by enabling content owners to not only protect their rights, but also define machine-readable licensing and compensation terms for those use cases. Together, the RSL Collective and Cloudflare are advancing a shared vision: a sustainable open web where publishers and creators thrive and are fairly compensated by AI companies.” – Eckart Walther, co-founder of the RSL Collective
- Stack Overflow: “The nature of the Internet and its implicit agreement with content publishers has changed quite dramatically over the past couple of years. With our large corpus of ~70 billion tokens of data, Stack Overflow is proud to partner with the leading AI labs and cloud providers on the data licensing front and we applaud Cloudflare for playing a central role to empower and protect content creators to build a scalable system for the internet in this new AI era.” – Prashanth Chandrasekar, CEO of Stack Overflow