AI Giants Crawl Websites 100x More Than They Give Back: Report

New data from Cloudflare reveals a troubling trend in the AI industry: major AI companies are aggressively crawling websites to extract valuable human data for training their models, while providing minimal referrals back to those same sites. This breakdown of the web’s “grand bargain” threatens the fundamental ecosystem that has powered the internet for decades.

Cloudflare, which manages approximately 20% of the world’s websites, began tracking what it calls the “crawl-to-refer ratio” in 2025. This metric measures how many times AI company bots request to crawl websites compared to how many referrals those platforms send back. The results are striking: Anthropic stands out with an extremely high ratio, meaning it crawls sites far more than it sends users to the web. According to the data from the first week of January 2026, Anthropic’s ratio has actually worsened since September 2025, indicating the company is taking more value while giving less back.

OpenAI shows a similar pattern, with its crawl-to-refer ratio also deteriorating over time. This aligns with previous Business Insider reporting from late 2024 that revealed bots from Anthropic and OpenAI were crawling some websites so aggressively that site owners saw their cloud-computing costs double within months. The AI bot swarm isn’t just extracting free data—it’s leaving website owners with significantly higher bills.

The traditional web operated on a simple exchange: websites allowed their content to be indexed for free, understanding they would receive referrals in return and could monetize through advertising and subscriptions. In the generative AI era, this deal is collapsing. AI answer engines and chatbots now provide direct answers to users, eliminating the need to visit original sources. This means websites that create and verify information are losing both traffic and revenue while AI companies profit from their content.

Google maintains a relatively low crawl-to-refer ratio, likely due to its traditional search engine that still displays clear website links. However, the company is increasingly integrating AI chatbot-style answers through AI Overviews and AI mode, potentially threatening this balance.

When contacted for comment, Anthropic did not respond to questions about why it crawls so extensively while providing minimal referrals. In September, the company questioned Cloudflare’s methodology and noted that its Claude AI chatbot’s web search feature was generating more referral traffic. OpenAI also did not respond to requests for comment.

Key Quotes

While tech companies spend lavishly on data centers, GPUs, and talent, they avoid talking about the other key ingredient of AI success: data. That’s because they don’t want to pay for the high-quality human data that’s needed for AI model training, inference, and AI outputs.

This observation from the article highlights the uncomfortable truth about AI development: while companies publicly discuss their massive infrastructure investments, they remain silent about their reliance on freely extracted web data, which represents a critical and undervalued resource.

In the past, tech companies would send users to the original sources of this information. This formed the grand bargain of the web. Sites would let their data be taken for free on the understanding that they would get referrals in return, and could pay for their efforts through advertising, subscriptions, and other techniques.

This quote explains the fundamental social contract that has governed the internet for decades, establishing the baseline against which current AI company behavior can be measured and found wanting.

One web developer saw a client’s cloud-computing costs double within a few months due to this AI bot swarm, according to BI reporting. So, not only are AI companies taking from the web and giving less back — they are also leaving some site owners with bigger bills to pay.

This reveals the double burden on website owners: they lose potential traffic and revenue while simultaneously facing increased infrastructure costs to handle aggressive AI bot crawling, creating a lose-lose situation for content creators.

Our Take

This data exposes a fundamental ethical crisis in the AI industry that deserves far more attention. The crawl-to-refer ratios reveal that leading AI companies are essentially strip-mining the web’s collective knowledge while dismantling the economic model that made its creation possible. Anthropic and OpenAI’s worsening ratios suggest this isn’t an oversight but an accelerating trend. The irony is profound: these companies position themselves as building beneficial AI for humanity, yet their business models threaten to destroy the very ecosystem of human knowledge creation they depend upon. If content creators can’t monetize their work because AI chatbots intercept their traffic, what incentive remains to produce high-quality information? We may be witnessing the tragedy of the commons playing out in real-time, where individual AI companies rationally maximize their data extraction while collectively degrading the web’s long-term sustainability. Google’s relatively better ratio offers little comfort, as even the search giant is moving toward AI-generated answers that reduce website visits.

Why This Matters

Source: https://www.businessinsider.com/anthropic-openai-google-perplexity-microsoft-mistral-crawling-web-referrals-cloudflare-2026-1