Hentai update #2396

Lc4B · 2025-06-14T23:12:52Z

Hentai Update

Updated:

hstream.yml
hanime.yml

Added:

Oppai.yml
HentaiSaturn.yml
HentaiSubIta.yml

Details:

Improved URL construction via sceneByFragment, with more accurate filename cleaning, especially for older hentai (*Oppai is partially excluded due to the use of symbols in its URLs).
Standardized output titles; each scraper had a different format, e.g.:
Hentai 2 2 | Hentai 2 - 2 | Hentai 2 Ep 2 | Hentai 2 Episode 02 | ...
Now, the episode number in the title will be displayed with a dash and two-digit format: - 02, ensuring a consistent library.
Removed capture of tags related to resolution like "4K" or "HD"; Stash already displays a banner with the correct resolution of the file you own, so those tags were just misleading.
In hentai context, each scene is considered an episode of a series, so I decided to use the Groups (Movie) section to create series entries; added group construction directly in the sceneScraper, so a manual scrape of a episode can create the corresponding group.
[hstream, hanime] While capturing the series during scene scraping, the cover of episode 1 will be used.
[hstream] Added groupScraper support from URL.
[hstream] During search, the cover URL was built from the thumbnail URL — now the search URL is modified directly, removing the need for this conversion.
[hstream] With the latest update, the preview image was changed to the cover in scenes; I reverted it back to the preview, which seems more appropriate since these are episode-specific preview images and better suited for how Stash uses them (we’ll use covers for series).
[hanime] Added scene URL construction from the cover URL.

(HentaiSaturn and HentaiSubIta are somewhat rough in various aspects, mainly used to retrieve Italian plot summaries, but can be useful for some older hentai.)

* Improved URL construction via sceneByFragment * Standardized output titles * Removed capture of tags related to resolution * added group construction in the sceneScraper * Added groupScraper support from URL * changed the search URL (with covers) to remove the post process change * Restored image "preview" for scenes

* Improved URL construction via sceneByFragment * Standardized output titles * Removed capture of tags related to resolution * Added group construction in the sceneScraper * Added scene URL construction from the cover URL

* Italian plots

Lc4B · 2025-06-15T13:33:34Z

hentaisaturn and hentaisubita have commonXPaths which is simply an aggregator, it is not used in any way, the object fields are grouped there and are taken individually where needed; so even if there are "wrong" fields they do not create errors

::fixed

fixed commonXPaths movies fields

feederbox826 · 2025-06-23T02:02:56Z

That is A Lot. what filename format are you using? I'm unable to replicate it with yt-dlp or are oyu just stripping it for Least Common (Filename)

Lc4B · 2025-06-24T01:51:44Z

That is A Lot. what filename format are you using? I'm unable to replicate it with yt-dlp or are oyu just stripping it for Least Common (Filename)

sorry but I don't understand what you mean, if you mean the cleaning of the sceneByFragment filename, I simply try to remove "common" fields or symbols that would break the url, also considering the url patterns, what is after the last number is also cut, for example:
Name - 02 - Title
[HSUB] Name 02v2 (HD) [3bb935c6]
Name_02_sub_eng
obviously the name between the beginning and the episode number must be correct or it won't find it anyway, but it can avoid renaming a few more files than before.

If instead you mean the output format of the title I only cut or added to the title obtained from the scraping to bring it to a name that resembles the template used by hstream; I used that because as a scraper it is more complete so I assume more used, but I didn't think of obtaining a common pattern but only a uniform one among the scrapers.
Maybe I should change it to something common, usually the series are: "Name S01E02" but I have no way to recognize the season, so as a pattern without season I could opt for "Name ep02".

(I have to change hanime that uses "Name 2" for the new ones and "Name Ep 2" for the old ones and I hadn't considered this case)

* updated title template: Name ep02

* updated title template: Name ep02 * updated title with more cases * updated sceneByURL * added sceneByName/sceneByQueryFragment via python

Lc4B · 2025-06-25T01:49:50Z

changed the output title of all scrapers to a more known pattern; I opted for: Name ep02 (Kodi/Episode Naming/No Season).
[hanime] I found out that it used many more names than I knew, I found:
- Name 1
- Name Ep 2
- Name ep 3
- Name Ep. 4
- Name Episode 5
- Name - Episode 6
  so I fixed the output of the scene and group title considering all these cases.
[hanime] inserted a more complete link for sceneByURL
[hanime] created a Python script to search for scenes! the script needs no configuration and only requires requests to get the search results.
I don't know python well, I wanted to avoid as many dependencies as possible and not pass all the Yaml scraper functions to this, so the script DOES NOT RUN any scene scraper, it only performs the search and gives you the title (original and not formatted) and the url of the chosen scene, so from these you can run the scrapers.
Hanime has a large amount of media and with this script you can search for scenes not only by "official name" but also by alias, eg:
- Bible Black
- Czarna księga
- 바이블 블랙
- バイブルブラック
- Im Banne des Satans
- (...)

hanime.search.scene.mp4

(possibly without adding dependencies and without removing the current operation of sceneByURL in yaml) if someone is able to start the scraper directly from the python from the obtained url (instead of making it exit) I thank him, because unfortunately I am not able to do it :/

feederbox826 · 2025-06-25T05:40:48Z

pretty sure it's because you're filtering by views and it's falling back when there aren't close enough results

you're still failing validation and whatever LLM you're using doesn't understand the scraper return format

Lc4B · 2025-06-25T11:45:01Z

hentaisaturn and hentasubita fail validation, because I use anchors like this:

      Date: &date
        selector: //div/b[text()="Data di uscita:"]/following-sibling::text()[1]
        postProcess:
          - replace:
              - regex: 'Gennaio'   
                with: 'January'
              - regex: 'Febbraio'  
                with: 'February'
              - regex: 'Marzo'     
                with: 'March'
              - regex: 'Aprile'    
                with: 'April'
              - regex: 'Maggio'    
                with: 'May'
              - regex: 'Giugno'    
                with: 'June'
              - regex: 'Luglio'    
                with: 'July'
              - regex: 'Agosto'    
                with: 'August'
              - regex: 'Settembre' 
                with: 'September'
              - regex: 'Ottobre'   
                with: 'October'
              - regex: 'Novembre'  
                with: 'November'
              - regex: 'Dicembre'  
                with: 'December'
          - parseDate: 2 January 2006

so with the alias *date it calls me the whole block (selector and post process) and I don't have to rewrite anything.
I tested it and on stash it works without problems (if you want to test it create a new group, do the scraper from the link, for example: https://www.hentaisaturn.tv/hentai/Amai-Ijiwaru and you will see the date formatted correctly)

while the validation expects a value matched to the anchor like:
Date: &date //div/b[text()="Data di uscita:"]/following-sibling::text()[1]
but this way it is no longer a "shortcut", and I have to rewrite the post process for everyone.
do I have to change it?

Maista6969 · 2025-06-25T12:05:18Z

hentaisaturn and hentasubita fail validation, because I use anchors like this:

HentaiSaturn isn't failing validation as far as I can tell, only HentaiSubIta: the validation failure is giving the wrong error message for that, your actual error is that the "parseDate" operation expects a string, but is receiving the integer 2006

      Date: &date
        selector: //span[@class="split"][b[text()="Anno:"]]/text()
        postProcess:
          - replace:
              - regex: '\s*(\d{4})'
                with: '$1'
          - parseDate: 2006

* removed incorrect parseDate

* changed the search for ascending title

feederbox826 · 2025-07-03T20:17:32Z

I do not like the addition of all the filename filtering, it seems to be overdoing it and i've opened up a RFC https://discourse.stashapp.cc/t/rfc-scraper-queryurlreplace/2375

also the conversion into a python script seems needlessly complicated and I don't like the direction your LLM of choice has taken since we do natively support JSON

Lc4B · 2025-07-04T16:07:30Z

I do not like the addition of all the filename filtering, it seems to be overdoing it and i've opened up a RFC https://discourse.stashapp.cc/t/rfc-scraper-queryurlreplace/2375

I can assure you that it seemed exaggerated but it is not, if we were talking about porn scrapers there would be no need because generally the name of the scene file can be exact or not. But the hentai filename can have both different formatting and various additional information that also changes depending on where the file is obtained (such as: subtitle language, fansub/site, hash, versions, codec, original title, ...) and there is always at least one of these, I have never gotten a hentai file with a completely clean name, never.

So I tried to do an automatic identification of my library with the "original" scrapers but they found almost nothing, so I studied what could be the main cases to isolate, and I built this filtering that for me obtained good results and with which I was able to avoid having to work manually on each scene or having to rename everything manually.

Also now I found these examples from which you could take inspiration: Scanning files without renaming them
(my files were all like this)

also the conversion into a python script seems needlessly complicated and I don't like the direction your LLM of choice has taken since we do natively support JSON

I absolutely don't want to convert everything to python, I already said that, I want to keep these scrapers working in yaml, but I wanted to add that function to hanime because it didn't have it and its search is really good; being able to use aliases means being able to search using a secondary, different or in another language title, compared to the main one used by the site and this gives you a great help.
I used py+requests simply because I have used it on other occasions and therefore I knew that this way I could get the search results and I went in that direction, if someone is able to get them directly without python can modify it as they prefer, it would be even better 👍

Lc4B added 5 commits June 15, 2025 00:49

Update hanime.yml

23baf2a

* Improved URL construction via sceneByFragment * Standardized output titles * Removed capture of tags related to resolution * Added group construction in the sceneScraper * Added scene URL construction from the cover URL

Create Oppai.yml

d0dfac8

Create HentaiSubIta.yml

a8243b1

* Italian plots

Create HentaiSaturn.yml

2bb849f

* Italian plots

Lc4B added 4 commits June 15, 2025 15:39

Update HentaiSaturn.yml

259dd29

fixed commonXPaths movies fields

Update HentaiSubIta.yml

747de50

fixed commonXPaths movies fields

Update HentaiSubIta.yml

349c804

Update HentaiSaturn.yml

956cb2e

Lc4B added 7 commits June 25, 2025 03:30

Update hstream.yml

a73dd39

* updated title template: Name ep02

Update Oppai.yml

ae7996c

* updated title template: Name ep02

Update HentaiSubIta.yml

40dfeed

* updated title template: Name ep02

Update HentaiSaturn.yml

289dd72

* updated title template: Name ep02

Update hanime.yml

3700f6c

* updated title template: Name ep02 * updated title with more cases * updated sceneByURL * added sceneByName/sceneByQueryFragment via python

Rename scrapers/hanime.yml to scrapers/hanime/hanime.yml

6e58937

add python

f71e6b6

Lc4B added 2 commits June 25, 2025 14:58

Update HentaiSubIta.yml

aa02008

* removed incorrect parseDate

Update hanime.py

7ba6bca

* changed the search for ascending title

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Hentai update #2396

Hentai update #2396

Uh oh!

Lc4B commented Jun 14, 2025

Uh oh!

Lc4B commented Jun 15, 2025 •

edited

Loading

Uh oh!

feederbox826 commented Jun 23, 2025

Uh oh!

Lc4B commented Jun 24, 2025

Uh oh!

Lc4B commented Jun 25, 2025 •

edited

Loading

Uh oh!

feederbox826 commented Jun 25, 2025

Uh oh!

Lc4B commented Jun 25, 2025

Uh oh!

Maista6969 commented Jun 25, 2025

Uh oh!

feederbox826 commented Jul 3, 2025

Uh oh!

Lc4B commented Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Hentai update #2396

Are you sure you want to change the base?

Hentai update #2396

Uh oh!

Conversation

Lc4B commented Jun 14, 2025

Hentai Update

Uh oh!

Lc4B commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feederbox826 commented Jun 23, 2025

Uh oh!

Lc4B commented Jun 24, 2025

Uh oh!

Lc4B commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feederbox826 commented Jun 25, 2025

Uh oh!

Lc4B commented Jun 25, 2025

Uh oh!

Maista6969 commented Jun 25, 2025

Uh oh!

feederbox826 commented Jul 3, 2025

Uh oh!

Lc4B commented Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lc4B commented Jun 15, 2025 •

edited

Loading

Lc4B commented Jun 25, 2025 •

edited

Loading