Skip to content

Blocking ByteDance for Hyper-Agressive / Malicious ByteSpider bot #48

@kkarhan

Description

@kkarhan

Apparently ByteDance is creating giant amounts of traffic with their ByteSpider crawler.

Whilst crawlers on their own ain't a problem per-se, the way it completely disregards the robots.txt file - unlike any bona-fide crawler - makes it basically malicious and should be considered as a DDoS attack.

Unlike the Internet Archive which also doesn't honour the robots.txt file it's behaviour is not one that could be considered 'negligible interference' as the Internet Archive mostly manually archives sites based off user requests to do so, but it basically siphons absurd amounts of data from it.

Sadly, ClownFlare aka. CloudFlare and AWS are somewhat complicit, so blocking the UserAgent bytespider server-sided is a must as well, similar to blocking GPTbot, and whilst recent reports indicate that ByteSpider now honors robots.txt I'd not count on that being true...

But adding the used AS138699 and AS396986's allocations to the ASN, IPv4 & IPv6 blocklists should be considered...

Needless to say hyper-agressive bots are a problem that needs resolution.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions