Cloudflare is now blocking all major LLMs by default (ChatGPT, Claude, Perplexity, etc). Since over 20% of the internet is behind Cloudflare, a lot of sites are already excluded from AI summaries unless they've explicitly allowed those bots.
- Default Blocking: Cloudflare has recently set its AI crawler protections to be turned on by default for new domains.
- Permission-Based Model: This means that AI companies are now required to obtain explicit permission from website owners before scraping content. When a new domain signs up with Cloudflare, they are given the choice upfront to allow or deny AI crawlers access.
- Opt-Out Option: Website owners who wish to allow AI crawlers can choose to opt out of the default blocking setting. Some discussions suggest this could involve a "pay per crawl" model where website owners charge AI companies for scraping their content.
- Sophisticated Protections: Cloudflare's anti-crawler protections are designed to be quite sophisticated, even employing generative AI to create scientifically accurate but unrelated content to mislead crawlers.
- Impact on LLM Developers: This shift is likely to have a significant impact on LLM developers, who will need to adapt their strategies for data collection and potentially engage in direct agreements with website owners.