Sujet : Re: bad bot behavior
De : anthk (at) *nospam* openbsd.home (anthk)
Groupes : comp.miscDate : 12. May 2025, 07:24:45
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <slrn101ue1g.198p.anthk@openbsd.home.localhost>
References : 1 2 3
User-Agent : slrn/1.0.3 (OpenBSD)
On 2025-03-18, Toaster <
toaster@dne3.net> wrote:
On Tue, 18 Mar 2025 12:00:07 -0500
D Finnigan <dog_cow@macgui.com> wrote:
>
On 3/18/25 10:17 AM, Ben Collver wrote:
Please stop externalizing your costs directly into my face
==========================================================
March 17, 2025 on Drew DeVault's blog
Over the past few months, instead of working on our priorities at
SourceHut, I have spent anywhere from 20-100% of my time in any
given week mitigating hyper-aggressive LLM crawlers at scale.
This is happening at my little web site, and if you have a web site,
it's happening to you too. Don't be a victim.
Actually, I've been wondering where they're storing all this data;
and how much duplicate data is stored from separate parties all
scraping the web simultaneously, but independently.
>
But what can be done to mitigate this issue? Crawlers and bots ruin the
internet.
>
GZip bombs + fake links = profit. Remember that gz'ed web pages are a
standard, even lynx can parse gz files natively.
Also, Megahal/Hailo under Perl. Feed it nonsense, and create some
non-visible contents under a robots.txt-dissallowed directory
full of Markov-chains generated nonsense and gzip bombs.