AI Bots from OpenAI and Anthropic Driving Up Cloud Costs for Websites

AI companies are unleashing aggressive web-scraping bots that are crippling small websites and driving up cloud computing costs, according to multiple website operators who spoke to Business Insider. Edd Coates, creator of the Game UI Database—a five-year labor of love cataloging over 56,000 video game interface screenshots—discovered his site was being overwhelmed by traffic from a single OpenAI IP address, causing pages to load three times slower and generating 502 errors.

The financial impact is staggering. Jay Peet, who manages Coates’ database servers, reported that within 10 minutes, the site was transferring 60-70 gigabytes of data—equivalent to $850 per day based on Amazon’s on-demand bandwidth pricing. This comes as AI companies race to collect training data before supplies run out; one study estimates the world’s usable AI training data could be depleted by 2032.

Joshua Gross of Planetary studio encountered similar issues after redesigning a client’s website, with cloud computing costs doubling due to scraping bots, primarily from Anthropic. An audit revealed “an overwhelming amount of nonsense traffic” consisting of repeated requests resulting in 404 errors. Between April 2023 and April 2024, nearly 5% of all online data added robots.txt restrictions specifically targeting AI botnets, with 25.9% blocking OpenAI, 13.3% blocking Anthropic, and 9.8% blocking Google.

Roberto Di Cosmo, director of Software Heritage—a nonprofit preserving publicly available source code—experienced an unprecedented surge in AI bot traffic that made the database unresponsive. His engineers spent hours identifying and blacklisting thousands of IP addresses, diverting resources from critical tasks. “We are not Google. We have a limited amount of resources,” Di Cosmo emphasized.

Tania Cohen, CEO of 360Giving, a nonprofit grants database, reported being “taken offline a couple of times due to AI bots,” particularly frustrating since much of their information is easily downloadable without scraping. David Senecal of Akamai noted that AI bots are “polluting key metrics” like conversion rates, causing problems for sites tracking marketing effectiveness. He also identified “impersonator” bots masquerading as legitimate crawlers from OpenAI and Anthropic.

While OpenAI claims it was crawling Coates’ site to “understand the web’s structure” at roughly twice per second, and both companies say they respect robots.txt protocols, Business Insider has previously reported instances where both bypassed these restrictions. The situation raises fundamental questions about who bears the cost when AI companies harvest the internet’s collective knowledge.

Key Quotes

Within a space of 10 minutes, we were transferring around 60 to 70 gigabytes of data. Based on Amazon’s on-demand bandwidth pricing, that would cost $850 per day.

Jay Peet, a game designer managing servers for the Game UI Database, quantified the financial burden that AI bot scraping imposes on small website operators, demonstrating how quickly costs can spiral out of control.

The fact that OpenAI’s behavior has crippled my website to the point where it stopped functioning is just the cherry on top.

Edd Coates, creator of the Game UI Database, expressed frustration after discovering OpenAI’s bots had rendered his free, nonprofit resource unusable, highlighting the collateral damage of aggressive AI data collection.

We are not Google. We have a limited amount of resources to run this operation.

Roberto Di Cosmo, director of Software Heritage nonprofit, emphasized the disparity between small organizations and tech giants when dealing with AI bot traffic that diverts critical engineering resources from core missions.

To find that my work is not only being stolen by a large organization but used to hurt the very people I’m trying to help makes me feel utterly sick.

Coates articulated the deeper ethical crisis: his database created to help game designers is being used to train AI that may ultimately replace those same creators, while simultaneously imposing costs that threaten his ability to maintain the resource.

Our Take

This story reveals an uncomfortable truth about the AI revolution: the costs are being externalized onto those least able to bear them. While OpenAI, Anthropic, and other AI giants raise billions in funding and command trillion-dollar valuations, they’re treating the internet as a free resource to be strip-mined, leaving small operators with crippling cloud bills.

What’s particularly concerning is the asymmetry of power. Robots.txt relies entirely on voluntary compliance—there’s no enforcement mechanism. When AI companies bypass these restrictions or deploy “impersonator” bots, website owners have little recourse. The fact that 25% of high-quality data sources added AI restrictions in just one year signals growing resistance, but also highlights how reactive and inadequate current protections are.

This isn’t just about bandwidth costs—it’s about who gets to participate in the AI-powered future. If maintaining an independent website becomes economically unviable, we risk losing the diverse, human-curated corners of the internet that make it valuable, ironically the very content that makes AI training data useful.

Why This Matters

This story exposes a critical infrastructure crisis emerging from the AI boom that threatens the open internet ecosystem. As AI companies race to collect training data before it runs out by 2032, they’re imposing massive hidden costs on small website operators, nonprofits, and independent creators who lack the resources of tech giants.

The implications are far-reaching: if maintaining a website becomes prohibitively expensive due to AI bot traffic, it could accelerate the consolidation of the internet into fewer, larger platforms that can afford these costs. This threatens the diversity and democratization that made the web valuable in the first place. The fact that robots.txt—a voluntary protocol from the 1990s—remains the primary defense mechanism reveals how unprepared internet infrastructure is for the AI era.

Moreover, this highlights a fundamental ethical question: AI companies are building trillion-dollar businesses by harvesting data created by individuals and small organizations, while simultaneously imposing costs that could drive these creators out of business. The irony that Coates’ database—created to help game designers—is being used to train AI that may replace those same designers underscores the existential tension at the heart of generative AI’s relationship with human creativity.

For those interested in learning more about artificial intelligence, machine learning, and effective AI communication, here are some excellent resources:

Source: https://www.businessinsider.com/openai-anthropic-ai-bots-havoc-raise-cloud-costs-websites-2024-9