Update pornolab.py #2329

Druidblack · 2025-05-12T19:54:09Z

I have made a number of corrections that will help the scraper to get data from the page with its different design.

Examples to test

Short description

I changed the logic of getting the name, now we get the name from the name of the header, since the name is not always indicated on the page.
I added options for receiving information for all fields, since there is no single page design rule, the fields with different names could differ, now there is a better chance that the information will be received.
Added image accessibility check. If they are not available, the scraper skips receiving them and moves on to the next (live image)
Changed the logic of getting links to the image. First of all, the scraper searches for the image that is placed on the left side, then on the right (if it is not found on the left).if no such thing is found, then it searches for the first image on the page.
If there are several images with the same location on the page, then the scraper randomly offers you one to choose from (a repeated request by the scraper may return a different image). If there are several images with the same location on the page, then the scraper randomly offers you one to choose from (a repeated request by the scraper may return a different image) This is necessary when using multiple covers on a page or different sides of the cover or the logo and the cover.
In order not to receive unnecessary images, the scraper cuts off the receipt of gif images and does not receive images that are located below the block with video characteristics (as they will not be placed in this part when framing the cover) And there may be screenshots from the video. Therefore, the scraper does not look at the second part of the page.

…ately There are many edits that help you get data from the site page

feederbox826

I'll need to go over it again with a translator and a fine tooth comb but seemes to contain a lot of redundant, repetitive code

My biggest irks and blockers for me are

anonymized variable names
.rstrip().rstrip().strip() all over the code and inconsistently
runtime errors, where functions are being ran before nullish tests and after regex matching

feederbox826 · 2025-05-13T04:07:16Z

scrapers/PornoLab/pornolab.py

+                return scraped
+
+            # ветка «Имя актрисы»
+            raw = self.get_field_text(post_b, ["Имя актрисы"])


Would have liked preserving meaningful variables names rather than raw over and over again

feederbox826 · 2025-05-13T04:07:20Z

scrapers/PornoLab/pornolab.py

+            # если после «В ролях» не нашлось ничего кроме «:» и <br> — возвращаем пустой список
+            return []

    def get_image(self):


Not a huge fan of the randomized image tbh

feederbox826 · 2025-05-13T04:07:22Z

scrapers/PornoLab/pornolab.py

+            if url.lower().endswith('.gif'):
+                continue
+
+            if not url:


not url has to predate url.lower() or you'll get a runtime error

feederbox826 · 2025-05-13T04:09:13Z

scrapers/PornoLab/pornolab.py

+                parts = split_pattern.split(raw)
+                scraped = []
+                for part in parts:
+                    m = re.search(r"\((.*?)\)", part)


again, removal of variable names for obscure naming schemes

feederbox826 · 2025-05-13T04:09:46Z

scrapers/PornoLab/pornolab.py

+                    if isinstance(sib, Tag) and 'post-color-text' in sib.get('class', []):
+                        raw = sib.get_text(strip=True)
+                        for tag in raw.rstrip('.').split(','):
+                            t = tag.strip()


double strip? You're doin git so often you might as well just handle it in a function

feederbox826 · 2025-05-13T04:11:47Z

scrapers/PornoLab/pornolab.py

+            for part in parts:
+                u = part.strip()
+                if not u:
+                    continue


check before stripping, unless you're expecting an empty string that also matched your regex rule

feederbox826 · 2025-05-13T04:12:29Z

scrapers/PornoLab/pornolab.py

            if director:
-                return director
-
+                return director.rstrip('.')


again with the double lstrip,rstrip and boolean check

feederbox826 · 2025-05-13T04:15:35Z

scrapers/PornoLab/pornolab.py

+
+        # Если нет ни img-left, ни img-right — перемешиваем generic и берём их в случайном порядке
+        if not left_imgs and not right_imgs and generic_imgs:
+            random.shuffle(generic_imgs)


This doesn't match with the logic in your PR, this is only shuffling generics, but then adds in candidates left and right images to an empty candates post?

feederbox826 · 2025-05-16T21:50:50Z

If anyone wants to use the PR, they have uploaded it on their own repo https://github.com/Druidblack/Stash-Scrapers/tree/main

there are many changes aimed at getting data from the page more accur…

c43af07

…ately There are many edits that help you get data from the site page

feederbox826 requested changes May 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update pornolab.py #2329

Update pornolab.py #2329

Uh oh!

Druidblack commented May 12, 2025 •

edited

Loading

Uh oh!

feederbox826 left a comment

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 May 13, 2025

Uh oh!

feederbox826 commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Update pornolab.py #2329

Are you sure you want to change the base?

Update pornolab.py #2329

Uh oh!

Conversation

Druidblack commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples to test

Short description

Uh oh!

feederbox826 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feederbox826 commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Druidblack commented May 12, 2025 •

edited

Loading