-
-
Notifications
You must be signed in to change notification settings - Fork 480
Update pornolab.py #2329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Update pornolab.py #2329
Conversation
…ately There are many edits that help you get data from the site page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to go over it again with a translator and a fine tooth comb but seemes to contain a lot of redundant, repetitive code
My biggest irks and blockers for me are
- anonymized variable names
- .rstrip().rstrip().strip() all over the code and inconsistently
- runtime errors, where functions are being ran before nullish tests and after regex matching
return scraped | ||
|
||
# ветка «Имя актрисы» | ||
raw = self.get_field_text(post_b, ["Имя актрисы"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have liked preserving meaningful variables names rather than raw
over and over again
scraped.append(ScrapedPerformer(name=name)) | ||
return scraped | ||
# если после «В ролях» не нашлось ничего кроме «:» и <br> — возвращаем пустой список | ||
return [] | ||
|
||
def get_image(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan of the randomized image tbh
if url.lower().endswith('.gif'): | ||
continue | ||
|
||
if not url: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not url has to predate url.lower() or you'll get a runtime error
parts = split_pattern.split(raw) | ||
scraped = [] | ||
for part in parts: | ||
m = re.search(r"\((.*?)\)", part) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, removal of variable names for obscure naming schemes
if isinstance(sib, Tag) and 'post-color-text' in sib.get('class', []): | ||
raw = sib.get_text(strip=True) | ||
for tag in raw.rstrip('.').split(','): | ||
t = tag.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double strip? You're doin git so often you might as well just handle it in a function
for part in parts: | ||
u = part.strip() | ||
if not u: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check before stripping, unless you're expecting an empty string that also matched your regex rule
if director: | ||
return director | ||
|
||
return director.rstrip('.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again with the double lstrip,rstrip and boolean check
|
||
# Если нет ни img-left, ни img-right — перемешиваем generic и берём их в случайном порядке | ||
if not left_imgs and not right_imgs and generic_imgs: | ||
random.shuffle(generic_imgs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match with the logic in your PR, this is only shuffling generics, but then adds in candidates left and right images to an empty candates post?
If anyone wants to use the PR, they have uploaded it on their own repo https://github.com/Druidblack/Stash-Scrapers/tree/main |
I have made a number of corrections that will help the scraper to get data from the page with its different design.
Examples to test
https://pornolab.net/forum/viewtopic.php?t=1908520
https://pornolab.net/forum/viewtopic.php?t=3177748
https://pornolab.net/forum/viewtopic.php?t=2333222
https://pornolab.net/forum/viewtopic.php?t=1582609
https://pornolab.net/forum/viewtopic.php?t=2137727
https://pornolab.net/forum/viewtopic.php?t=2747150
https://pornolab.net/forum/viewtopic.php?t=1984852
https://pornolab.net/forum/viewtopic.php?t=1384220
https://pornolab.net/forum/viewtopic.php?t=2861716
https://pornolab.net/forum/viewtopic.php?t=1580232
Short description