-
-
Notifications
You must be signed in to change notification settings - Fork 480
Hentai update #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Hentai update #2396
Conversation
* Improved URL construction via sceneByFragment * Standardized output titles * Removed capture of tags related to resolution * added group construction in the sceneScraper * Added groupScraper support from URL * changed the search URL (with covers) to remove the post process change * Restored image "preview" for scenes
* Improved URL construction via sceneByFragment * Standardized output titles * Removed capture of tags related to resolution * Added group construction in the sceneScraper * Added scene URL construction from the cover URL
* Italian plots
* Italian plots
hentaisaturn and hentaisubita have ::fixed |
fixed commonXPaths movies fields
fixed commonXPaths movies fields
That is A Lot. what filename format are you using? I'm unable to replicate it with yt-dlp or are oyu just stripping it for Least Common (Filename) |
sorry but I don't understand what you mean, if you mean the cleaning of the sceneByFragment filename, I simply try to remove "common" fields or symbols that would break the url, also considering the url patterns, what is after the last number is also cut, for example: If instead you mean the output format of the title I only cut or added to the title obtained from the scraping to bring it to a name that resembles the template used by hstream; I used that because as a scraper it is more complete so I assume more used, but I didn't think of obtaining a common pattern but only a uniform one among the scrapers. (I have to change hanime that uses "Name 2" for the new ones and "Name Ep 2" for the old ones and I hadn't considered this case) |
* updated title template: Name ep02
* updated title template: Name ep02
* updated title template: Name ep02
* updated title template: Name ep02
* updated title template: Name ep02 * updated title with more cases * updated sceneByURL * added sceneByName/sceneByQueryFragment via python
hanime.search.scene.mp4(possibly without adding dependencies and without removing the current operation of sceneByURL in yaml) if someone is able to start the scraper directly from the python from the obtained url (instead of making it exit) I thank him, because unfortunately I am not able to do it :/ |
pretty sure it's because you're filtering by views and it's falling back when there aren't close enough results you're still failing validation and whatever LLM you're using doesn't understand the scraper return format |
hentaisaturn and hentasubita fail validation, because I use anchors like this:
so with the alias *date it calls me the whole block (selector and post process) and I don't have to rewrite anything. while the validation expects a value matched to the anchor like: |
HentaiSaturn isn't failing validation as far as I can tell, only HentaiSubIta: the validation failure is giving the wrong error message for that, your actual error is that the "parseDate" operation expects a string, but is receiving the integer Date: &date
selector: //span[@class="split"][b[text()="Anno:"]]/text()
postProcess:
- replace:
- regex: '\s*(\d{4})'
with: '$1'
- parseDate: 2006 |
* removed incorrect parseDate
* changed the search for ascending title
I do not like the addition of all the filename filtering, it seems to be overdoing it and i've opened up a RFC https://discourse.stashapp.cc/t/rfc-scraper-queryurlreplace/2375 also the conversion into a python script seems needlessly complicated and I don't like the direction your LLM of choice has taken since we do natively support JSON |
I can assure you that it seemed exaggerated but it is not, if we were talking about porn scrapers there would be no need because generally the name of the scene file can be exact or not. But the hentai filename can have both different formatting and various additional information that also changes depending on where the file is obtained (such as: subtitle language, fansub/site, hash, versions, codec, original title, ...) and there is always at least one of these, I have never gotten a hentai file with a completely clean name, never. So I tried to do an automatic identification of my library with the "original" scrapers but they found almost nothing, so I studied what could be the main cases to isolate, and I built this filtering that for me obtained good results and with which I was able to avoid having to work manually on each scene or having to rename everything manually. Also now I found these examples from which you could take inspiration: Scanning files without renaming them
I absolutely don't want to convert everything to python, I already said that, I want to keep these scrapers working in yaml, but I wanted to add that function to hanime because it didn't have it and its search is really good; being able to use aliases means being able to search using a secondary, different or in another language title, compared to the main one used by the site and this gives you a great help. |
Hentai Update
Updated:
Added:
Details:
sceneByFragment
, with more accurate filename cleaning, especially for older hentai (*Oppai is partially excluded due to the use of symbols in its URLs).Hentai 2 2
|Hentai 2 - 2
|Hentai 2 Ep 2
|Hentai 2 Episode 02
| ...Now, the episode number in the title will be displayed with a dash and two-digit format:
- 02
, ensuring a consistent library.sceneScraper
, so a manual scrape of a episode can create the corresponding group.groupScraper
support from URL.(HentaiSaturn and HentaiSubIta are somewhat rough in various aspects, mainly used to retrieve Italian plot summaries, but can be useful for some older hentai.)