Technology

“AI trackers are ruining the Internet”: bots are threatening even Wikipedia

Share
Share

The Internet Is Under Siege — Not by Cybercriminals, but by a Growing Wave of AI Bots Consuming Bandwidth Like Never Before

Their mission: to crawl and collect vast amounts of content — text, images, and video — to train language models and image generators. But the cost of this activity is being borne by key pillars of open knowledge, like Wikimedia, and thousands of open-source projects operating on limited resources.

Since early 2024, the Wikimedia Foundation has reported a 50% increase in bandwidth usage, particularly in its multimedia repository, Wikimedia Commons. During peak moments — such as following the death of former U.S. President Jimmy Carter — this surge in traffic caused slow page loads and overwhelmed connections for readers.

What’s most concerning is that this isn’t due to increased human interest. The majority of this traffic comes from automated bots — many of them unidentified — scraping content to feed AI systems.

In practice, this means that nearly 65% of connections to Wikimedia’s central servers are being used by crawlers that ignore basic protocols like the robots.txt file, traditionally used to limit automated access to websites.

Wikimedia operates on a “knowledge as a service” model: its content is free and openly reusable — a cornerstone in the development of search engines, voice assistants, and now AI models. But that very openness is starting to work against it.

The situation is even more critical for small open-source projects maintained by communities or individual developers. Many are watching their limited resources being drained by AI bot traffic, causing operating costs to skyrocket — or worse, forcing projects offline altogether.

Gergely Orosz, developer and author of The Software Engineer’s Guidebook, experienced this firsthand: data usage on one of his projects increased sevenfold in a matter of weeks, forcing him to pay penalties for exceeding bandwidth limits.

In response, some developers are going on the offensive. Community-built tools like Nepenthes and corporate solutions like Cloudflare’s AI Labyrinth are deploying “tarpits” — traps filled with fake or irrelevant content (often also AI-generated) designed to confuse and exhaust bots, wasting their resources without providing useful data.

At the heart of this crisis lies a fundamental contradiction: the same openness that enabled AI to flourish is now threatening the survival of the open platforms that made it possible. AI companies benefit from free and open content, but do not contribute to the infrastructure that sustains it. This outsourcing of costs puts the sustainability of the ecosystem at serious risk.

If no new consensus is reached, the greatest threat isn’t that AI will run out of data — it’s that the open spaces feeding it may shut down from exhaustion.

 

Share
Related Articles
Technology

The United States grants a new extension to TikTok

President Donald Trump announced that he has signed an executive order granting...

Technology

Hayao Miyazaki, director of Studio Ghibli, rejects AI-generated images: “It is an insult to life itself”

In recent days, social media has been flooded with portraits transformed into...

Technology

According to The New York Times, Amazon has made an offer to acquire TikTok in the United States

Amazon has submitted a last-minute bid to acquire TikTok in the United...

Technology

Nintendo Switch 2: bigger, more powerful, and ready to dominate the generation

After a long wait, Nintendo has finally unveiled its new console: the...

Technology

AMD CPUs spark controversy in 2025: some Ryzen 7 9800X3D units fail after 30 minutes of use

The launch of the Ryzen 7 9800X3D marked a milestone in the...

Technology

TikTok is running out of time to find a buyer

TikTok users might experience a sense of déjà vu this week as...

Technology

Video game developers warn: Avoid the latest NVIDIA driver

2025 hasn’t started off well for NVIDIA, at least in the consumer...