Revision of AEBN scraper #1291

SirCumAlot1988 · 2023-03-16T11:18:44Z

Hi,

I propose this as an alternative/replacement to the existing AEBN scraper.

It has the same functionality as the existing AEBN scraper with one major additional feature:

It is now possible to scrape the metadata of a specific movie scene. This is helpful if you have a movie saved as split scenes (i.e. one file per movie scene). To scrape a single scene instead of the complete movie, do one of the following:

Enter the movie URL and add a separator + scene nr to the URL. Default separators are plus, comma and full stop but you can define your own in the header of the .py file. Example: If you want to scrape scene 2 of Kendras Obsession, enter https://straight.aebn.com/straight/movies/218523/kendras-obsession+2
Enter the name of the movie in the search field at the top of the AEBN site. You will now get a list of scenes. If you hover over the scenes, you will see the movie title and scene nr. Search for the scene you want and click on the 3 points ("Scene Information"). A popup window will appear. Now copy the link at the top of the popup window (in this example the link "Kendras Obsession, scene 2") and hand it over to the scraper.

I personally prefer option 1, but it is up to you.

This way the scraper will load only the metadata (actos, tags, etc.) of this specific scene. The scraper randomly selects one of the scene thumbnails as cover image. If the complete movie is scraped, it uses the movie cover.

Despite that I added some minor improvements:

Movie performers are now scraped correctly. If you have a movie with a long list of performers (e.g. Orgy Masters 6), some performers will be loaded via ajax calls. Hence, the old scraper did not scrape all performers. This is now handled properly.
The performer scraper now scrapes aliases, tattoos and piercings, which are "hidden" in the performer biography (see Anna Bell Peaks as an example for tattoos/piercings and Angelika Grays for aliases)

Finally, as an "experimental" feature, I added the option to invoke the performer scraper from the scene scraper. If you set the option "scrape_performer_details" in the header of the .py file to true, the scene scraper will scrape the details for each performer of the scene. So, if you do not have this performer yet and create it from the scene scraping window the performer detail fields will be populated already. Similarly with the option "scrape_performer_images" enabled, you will have the performer image available without the need to rescrape that performer. Of course you can combine both options. However, admittedly this slows down the scraper. So, I am not sure if it is a reasonable option and I am open for discussions on that.

Cheers!

Fixes the following problems: -Performer Image not scraped anymore due to minor changes in the website -Most of the metadata is not scraped anymore due to minor changes in the website (Birthdate, Country, Ethnicity, Nationality, Eye Color, Height, Weight, Fake Tits, Career Length, Twitter Instagram) -Birthdate not scraped properly in some cases -Hair Color not scraped properly in some cases -Measurements not scraped properly in some cases -Gender defaults to female now -Some cosmetic corrections to career length and details

-Added regex for removing references to further fields (Ethnicity, Eye Color, Fake Tits, Hair Color, Career Length, Aliases) -Career Length: Maps "Present" and "Current" to empty string -Country: Maps nationality to country -Career Length: Maps em dash to hyphen -Fake Tits: Maps "Enhanced" to "Fake" and "Natural" to "Natural"

- format - removed fixed Gender as not all performers where female - fixed twitter/instagram selectors - tweaked a couple of regexes

-Implemented ability to scrape movie scenes -Fixed movie performers not scraped properly -Added functionality to scrape performer tattoos, piercings and aliases

… scene scraping -Improved handling of tattoos/piercings/aliases during performer scraping -Added handling of transgender performers

scruffynerf · 2023-12-23T00:27:37Z

thank you for this code. Been doing a bunch of AEBN scraping and didn't realize it wasn't getting all the performers.
I actually managed to massage the yml version to grab the full list (among other tweaks I've made to it), and the essence of this is good, but it's using too many one-offs, and would benefit from a big rewrite using StashAPI.
On my radar to do do that rewrite (because as much as I love yml, I'm now pushing it to a limit... subscraping every performer works, but...

SirCumAlot1988 · 2024-01-19T19:51:31Z

Yeah, I also did not realize in the beginning that not all performers are scraped. Actually I came across it when I implemented this scraper.

However, in my eyes the main advantage of my scraper is that it can scrape a single scene of a movie. That's really helpful if you have movies as split scenes. Scraping the metadata of the whole movie doesn't make too much sense in this situation.

AEBN is really good at providing metadata for each scene separately. I realized too late however that hotmovies is even better in this regard. So, maybe I will implement something similar for hotmovies in the future.

SirCumAlot1988 and others added 7 commits December 30, 2022 23:54

Update Boobpedia.yml

a21b3d6

- format - removed fixed Gender as not all performers where female - fixed twitter/instagram selectors - tweaked a couple of regexes

Merge branch 'stashapp:master' into master

dd1b46d

Revised AEBN scraper:

2acc4ee

-Implemented ability to scrape movie scenes -Fixed movie performers not scraped properly -Added functionality to scrape performer tattoos, piercings and aliases

-Added functionality to scrape performer details and/or images during…

5db4721

… scene scraping -Improved handling of tattoos/piercings/aliases during performer scraping -Added handling of transgender performers

Fixed AEBN.yml to include the correct.py file

2f7eb20

SirCumAlot1988 mentioned this pull request Mar 16, 2023

[Bug Report] Movie creation from Scene Scraper stashapp/stash#3551

Closed

bnkai added enhancement New feature or request script Scraper executes a script labels Mar 19, 2023

Maista6969 force-pushed the master branch from 48ab227 to 8e2b818 Compare September 10, 2023 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revision of AEBN scraper #1291

Revision of AEBN scraper #1291

SirCumAlot1988 commented Mar 16, 2023 •

edited

Loading

scruffynerf commented Dec 23, 2023 •

edited

Loading

SirCumAlot1988 commented Jan 19, 2024

Revision of AEBN scraper #1291

Are you sure you want to change the base?

Revision of AEBN scraper #1291

Conversation

SirCumAlot1988 commented Mar 16, 2023 • edited Loading

scruffynerf commented Dec 23, 2023 • edited Loading

SirCumAlot1988 commented Jan 19, 2024

SirCumAlot1988 commented Mar 16, 2023 •

edited

Loading

scruffynerf commented Dec 23, 2023 •

edited

Loading