AI Web Crawling Bots is the cockroaches on the Internet, many software developers believe. Some Devs began to fight again in creative, often humorous ways.
While any website can be targeted by bad crawler behavior – Sometimes lowered the site – Open resource developers are “not proportional” affected, Writing Niccolò Venerandi, developer of a Linux desktop known as plasma and owns blog libranews.
Through their nature, sites that pose projects of free and open resources (FOSS) share more of their public infrastructure, and they also tend to have less resources than commercial products.
The issue is that many AI bots do not honor the Robots Exclusion Protocol Robot.txt file, the tool that tells the bots what doesn't crawl, originally created for search engine bots.
To a “cry for help” Blog post In January, Foss Developer Xe Iaso described how Amazonbot's falling to a GAT server website was relentless until the DDOs were caused by DDOs. The GIT Servers projects have packed foss projects so anyone who wants to download the code or contribute to it.
But this bot ignored Iaso's robot.txt, hid behind other IP addresses, and pretended to be other users, Iaso said.
“It is futile to block AI Crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more,” Iaso mourned.
“They will check your site until it falls, and then they will scrap it more. They will click each link at each link at each link, looking at the same pages repeatedly. Some of them will also click on the same link several times in the same second,” the Developer wrote in the post.
Enter the God of Graves
So Iaso fought with intelligence, developing a tool called Anubis.
Anubis is A reverse proxy proof-of-work check That should be passed before the requests are permitted to hit a GIT server. It prevents bots but lets the browsers run by people.
The funny part: Anubis is the name of a god in Egypt's myth that leads the dead at the judgment.
“Anubis weighed your soul (heart) and if it was heavier than a feather, your heart was eating and you, like, died Mega,” Iaso told Techcrunch. If a web request passes the challenge and is determined to be human, A cute picture of anime Success is announced. The drawing is “my acquisition with anthropomorphizing Anubis,” Iaso said. If this is a bot, the request will be rejected.
The wryly named project spreads like the wind in the middle of the foss community. Iaso Shared it with github On March 19, and in just a few days, it collected 2,000 stars, 20 contributing, and 39 forks.
Revenge as defense
Anubis's instant popularity shows that Iaso's disease is not unique. In fact, Venerandi shared the story after the story:
- Founder of the CEO of Described by Sourcehut Drew Devault Spending “from 20-100% of my time on any given week that alleviates hyper-aggressive LLM crawlers in size,” and “experiencing dozens of short flows per week.”
- Jonathan Corbet, a famous foss developer who runs the Linux Industry News site LWN, warns that his site is Slowed by DDOS level traffic “From the AI scraper bots.”
- Kevin Fenzi, the sysadmin of Linux Fedora's massive project, says ai scraper bots became aggressive, he had to hinder the whole Brazil country from accessing.
Venerandi tells Techcrunch that he knows many other projects that are experiencing the same issues. One of them is “must temporarily prohibit all Chinese IP addresses at some point.”
Let the sink for a moment – that the developers “have to turn to ban the whole countries” just to expel AI bots that ignore robot.txt files, says Venerandi.
Beyond weighing the soul of a web demanding, other Devs believe that revenge is the best defense.
A few days ago Hacker newsuser Xyzal The suggested loading of robot.txt prohibited pages with “a loading of articles on the benefits of drinking” or “articles on the positive effect of catching measles on bed performance.”
“Imagine that we need to go to the bots to get the _negative_ amount of utility from visiting our traps, not just zero value,” Xyzal explained.
As happened, in January, an unidentified creator known as “Aaron” released a tool called Nepenthes Aimed at doing that exactly. Traps crawler in an endless maze of fake content, a goal admitted by Dev Ars Technica is aggressive if not malicious. The tool is named after a carnival plant.
And Cloudflare, probably the biggest commercial player who offered many tools to expel AI Crawler, last week released a similar tool called AI Labyrinth.
It intends to “slow down, confuse, and waste the resources of AI crawlers and other bots who do not respect the directives of 'no crawl',” CloudFlare described In the post on its blog. Cloudflare said it fed the misconduct of AI crawlers that “irrelevant content instead of taking your legitimate website data.”
Sourcehut's Devault told Techcrunch that “Nepenthes has a enjoyable feeling of justice here, as it feeds the nonsense to the crawlers and poison of their wells, but eventually Anubis is the solution worked” for his site.
But Devault also released a public, sincerely please for a more direct arrangement: “Please stop the LLM or AI image generators or GitHub copilot or any of these garbage. I ask you to stop using them, stop talking about them, stop doing new ones, just stop.”
Because that possibility is Zilch, developers, especially foss, are fighting intelligence and a touch of humor.