Skip to content

Conversation

@spaceyuck
Copy link
Contributor

Generated by an automatic template. Can be removed if not applicable.

Scraper type(s)

  • imageByFragment
  • imageByURL

Examples to test

SFW because NSFW needs account and API token

https://wallhaven.cc/w/9dqojx
https://wallhaven.cc/w/yxd8jk

Short description

When right-click saving a wallpaper from wallhaven, the default filename has a predictable structure ("wallhaven-."). This change adds image fragment scraping to allow automatically picking up IDs that are a clear and separable postfix to the filename.

Copy link
Collaborator

@feederbox826 feederbox826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the weirdest regex I've seen yet, did you mean to do a-z0-9? and also escape the . for extension?

Also some of their older posts follow a seperate format, before they migrated. A strict regex might be possible + better

@feederbox826
Copy link
Collaborator

Will also put in draft/ potentially close as they have IQDB search for actual ByFragment. Will close if/when IQDB search implemented

https://wallhaven.cc/forums/thread/1169

@feederbox826 feederbox826 marked this pull request as draft October 16, 2025 23:51
@spaceyuck
Copy link
Contributor Author

This might be the weirdest regex I've seen yet, did you mean to do a-z0-9? and also escape the . for extension?

\d and 0-9 are equivalent, so it's the same as a-z0-9. Point taken about the ., one of those I was stupid but it worked anyway kind of things, fixed now.

Also some of their older posts follow a seperate format, before they migrated. A strict regex might be possible + better

The oldest thing they have still uses an ID in the scheme of [a-z0-9]{6}, and I've checked some of my oldest from around 2014, they use the same format too. I can't find any info or example off another ID format still being in use, they might have migrated everything over to the current scheme.

I made it very restrictive explicitly to avoid false positives, it really should only match file names that have 6 characters / digits at the end, before the extension, with some kind of separator before.

IQDB search

Huh, I've never actually noticed that feature. I've looked into it a bit, and it's not mentioned in the API docs. Playing around with it a bit, it's just a POST to their search endpoint , maybe it just works for API search too. Otherwise this will be a whole new thing, I already see XSRF tokens and Cloudflare cookies, plus the login requirement for NSFW would probably need credentials or a session cookie in the scraper.

I'll look into it more later, but this really might be a feature of it's own.

@feederbox826
Copy link
Collaborator

would have to be in python but imo would be leagues above trying to match filename

@spaceyuck
Copy link
Contributor Author

Just an update after looking into it a bit:

  • definitely not supported by API (405 on POST to /search)
  • after playing with the site in Firefox a bit, Cloudflare might be present but optional, it does still seem to work without the magic Cloudflare cookie and Cloudflares magic Javascript blocked - but I may have missed something
  • right now running into status 419 (session expired) errors even with session cookies set and sent
  • in the worst case, this might need CDP - is CDP even supported for Python? Might also give cloudscraoer a try, IAFD scrapers seems to use it

@spaceyuck
Copy link
Contributor Author

Current broken state sequestered into its own subbranch wallhaven-imageByFragtment-iqdb, can't get it to work right now.

@feederbox826
Copy link
Collaborator

Just an update after looking into it a bit:

  • definitely not supported by API (405 on POST to /search)
  • after playing with the site in Firefox a bit, Cloudflare might be present but optional, it does still seem to work without the magic Cloudflare cookie and Cloudflares magic Javascript blocked - but I may have missed something
  • right now running into status 419 (session expired) errors even with session cookies set and sent
  • in the worst case, this might need CDP - is CDP even supported for Python? Might also give cloudscraoer a try, IAFD scrapers seems to use it

405 is method not allowed, maybe PUT instead of POST? but hm

cdp is supported on python but it's a lot weirder

@spaceyuck spaceyuck force-pushed the wallhaven-imageByFragtment branch from cc05b01 to ad9c669 Compare October 26, 2025 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants