Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision of AEBN scraper #1291

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

SirCumAlot1988
Copy link
Contributor

@SirCumAlot1988 SirCumAlot1988 commented Mar 16, 2023

Hi,

I propose this as an alternative/replacement to the existing AEBN scraper.

It has the same functionality as the existing AEBN scraper with one major additional feature:

It is now possible to scrape the metadata of a specific movie scene. This is helpful if you have a movie saved as split scenes (i.e. one file per movie scene). To scrape a single scene instead of the complete movie, do one of the following:

  1. Enter the movie URL and add a separator + scene nr to the URL. Default separators are plus, comma and full stop but you can define your own in the header of the .py file. Example: If you want to scrape scene 2 of Kendras Obsession, enter https://straight.aebn.com/straight/movies/218523/kendras-obsession+2

  2. Enter the name of the movie in the search field at the top of the AEBN site. You will now get a list of scenes. If you hover over the scenes, you will see the movie title and scene nr. Search for the scene you want and click on the 3 points ("Scene Information"). A popup window will appear. Now copy the link at the top of the popup window (in this example the link "Kendras Obsession, scene 2") and hand it over to the scraper.

I personally prefer option 1, but it is up to you.

This way the scraper will load only the metadata (actos, tags, etc.) of this specific scene. The scraper randomly selects one of the scene thumbnails as cover image. If the complete movie is scraped, it uses the movie cover.

Despite that I added some minor improvements:

  • Movie performers are now scraped correctly. If you have a movie with a long list of performers (e.g. Orgy Masters 6), some performers will be loaded via ajax calls. Hence, the old scraper did not scrape all performers. This is now handled properly.
  • The performer scraper now scrapes aliases, tattoos and piercings, which are "hidden" in the performer biography (see Anna Bell Peaks as an example for tattoos/piercings and Angelika Grays for aliases)

Finally, as an "experimental" feature, I added the option to invoke the performer scraper from the scene scraper. If you set the option "scrape_performer_details" in the header of the .py file to true, the scene scraper will scrape the details for each performer of the scene. So, if you do not have this performer yet and create it from the scene scraping window the performer detail fields will be populated already. Similarly with the option "scrape_performer_images" enabled, you will have the performer image available without the need to rescrape that performer. Of course you can combine both options. However, admittedly this slows down the scraper. So, I am not sure if it is a reasonable option and I am open for discussions on that.

Cheers!

SirCumAlot1988 and others added 7 commits December 30, 2022 23:54
Fixes the following problems:
-Performer Image not scraped anymore due to minor changes in the website
-Most of the metadata is not scraped anymore due to minor changes in the website (Birthdate, Country, Ethnicity, Nationality, Eye Color, Height, Weight, Fake Tits, Career Length, Twitter Instagram)
-Birthdate not scraped properly in some cases
-Hair Color not scraped properly in some cases
-Measurements not scraped properly in some cases
-Gender defaults to female now
-Some cosmetic corrections to career length and details
-Added regex for removing references to further fields (Ethnicity, Eye Color, Fake Tits, Hair Color, Career Length, Aliases)
-Career Length: Maps "Present" and "Current" to empty string
-Country: Maps nationality to country
-Career Length: Maps em dash to hyphen
-Fake Tits: Maps "Enhanced" to "Fake" and "Natural" to "Natural"
- format
- removed fixed Gender as not all performers where female
- fixed twitter/instagram selectors
- tweaked a couple of regexes
-Implemented ability to scrape movie scenes
-Fixed movie performers not scraped properly
-Added functionality to scrape performer tattoos, piercings and aliases
… scene scraping

-Improved handling of tattoos/piercings/aliases during performer scraping
-Added handling of transgender performers
@scruffynerf
Copy link
Contributor

scruffynerf commented Dec 23, 2023

thank you for this code. Been doing a bunch of AEBN scraping and didn't realize it wasn't getting all the performers.
I actually managed to massage the yml version to grab the full list (among other tweaks I've made to it), and the essence of this is good, but it's using too many one-offs, and would benefit from a big rewrite using StashAPI.
On my radar to do do that rewrite (because as much as I love yml, I'm now pushing it to a limit... subscraping every performer works, but...

@SirCumAlot1988
Copy link
Contributor Author

Yeah, I also did not realize in the beginning that not all performers are scraped. Actually I came across it when I implemented this scraper.

However, in my eyes the main advantage of my scraper is that it can scrape a single scene of a movie. That's really helpful if you have movies as split scenes. Scraping the metadata of the whole movie doesn't make too much sense in this situation.

AEBN is really good at providing metadata for each scene separately. I realized too late however that hotmovies is even better in this regard. So, maybe I will implement something similar for hotmovies in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request script Scraper executes a script
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants