Add bergamot #1064

mlduff · 2024-04-16T01:26:22Z

This scraper adds support for dashboard.bergamot.app. Would of liked a few more test URLs, but used the following in testing.

Had to implement a legacy test - not sure if I used the right method to specify the origin URL (which is different to the actual request URL). I ended out having a yield for the first URL (that the user would pass in), with the associated testhtml not actually being used.

Resolves #986

jayaddison · 2024-04-17T15:59:05Z

Had to implement a legacy test - not sure if I used the right method to specify the origin URL (which is different to the actual request URL). I ended out having a yield for the first URL (that the user would pass in), with the associated testhtml not actually being used.

That seems OK to me @mlduff - a web browser would load the ~~testhtml~~ HTML page and then load any related JavaScript, and subsequently should make the JSON request you found previously. In addition, people using this library would likely expect to pass HTML from a webpage in -- meaning that they've made a HTTP GET request to request the relevant dashboard page, and then the scraper will make the subsequent HTTP GET. So the test describes the expected usage and behaviour, which is what we want.

recipe_scrapers/__init__.py

recipe_scrapers/bergamot.py

jayaddison · 2024-04-17T16:08:36Z

This generally looks good to me, thank you @mlduff - a few small suggestions/questions and then I'll re-review.

jayaddison · 2024-04-17T16:26:49Z

recipe_scrapers/bergamot.py

+    def author(self):
+        return None


Let's try to find some basic authorship information when we can, even if it is not perfect/ideal.

When a source_url is found, I think we could return the domain name.. not ideal, but better than nothing. I've asked in the bugreport (#986) whether we know of any example recipes that contain more author info.

Co-authored-by: James Addison <[email protected]>

mlduff · 2024-04-17T22:42:16Z

I had a look through the JS on the site and I have added cook_time as I believe it should be implemented based on:

{className:"recipe__time_title"},n("Prep")),a.a.createElement("div",{className:"recipe__time_content"},Ce(r.time.prepTime))),!!r.time.cookTime&&a.a.createElement("div",{className:"recipe__time column"},a.a.createElement("div",{className:"recipe__time_title"},n("Cook")),a.a.createElement("div",{className:"recipe__time_content"},Ce(r.time.cookTime))),!!r.time.totalTime&&a.a.createElement("div",{className:"recipe__time column"},a.a.createElement("div",

Couldn't find author using the same method @jayaddison

jayaddison · 2024-04-19T11:24:38Z

Ok, thanks @mlduff. I think retaining the authorship info is fairly important, so I'd like to pause until we get some more ideas / suggestions.

mlduff · 2024-04-19T12:25:03Z

Ok, thanks @mlduff. I think retaining the authorship info is fairly important, so I'd like to pause until we get some more ideas / suggestions.

@jayaddison would it be completely off the cards to do an extra request to the origin website? That's the only way I can see getting any extra information as it seems bergamot does not support author at all.

If you are satisfied that bergamot doesn't support author whatsoever, would you merge the PR? Or does this mean we have to decline it?

jayaddison · 2024-04-19T15:17:54Z

@mlduff hmm, I'm not sure that I'd be entirely happy with making an additional request either; it would seem kinda unexpected for some code that appears to scrape from HTML at a URL to make requests to other domains -- and fairly arbitrarily, based on whatever the response of Bergamot included. I don't expect Bergamot would intentionally link to non-related sites for any reason, but that kind of thing can happen and can cause security/privacy vulnerabilities in web browsers; so I'd prefer to maintain simplicity and avoid that.

mlduff · 2024-04-20T01:35:31Z

@jayaddison yeah that makes sense. Is there anything I can do to help get this one in, or do we just have to wait for bergamot to reply and see if they will add author? If they say no, can we proceed with just the source domain?

mlduff · 2024-04-25T07:19:13Z

@jayaddison Sorry, just bumping on the above - I have implemented the source domain for the author field - is it okay to proceed with this PR?

jayaddison · 2024-04-25T13:47:23Z

Thanks @mlduff - no, I don't want to proceed with this given that it isn't providing attribution for an author name that we know exists on an origin recipe. Source domain could arguably provide some of that, but I'd prefer to wait until we can retrieve the author credit from the page we're scraping.

mlduff · 2024-05-04T11:58:16Z

Should I closethis PR for now @jayaddison? Or leave it up?

jayaddison · 2024-05-06T10:25:16Z

Should I closethis PR for now @jayaddison? Or leave it up?

I've been on the other side of this situation a few times now -- providing a contribution to a project that it is unclear will ever be accepted, sometimes for a reason that they've communicated, or sometimes simply because maintainers are unavailable -- and my usual preference is to keep the pull request open, because it's:

More likely that other people will notice and not repeat the work.
More likely that other people could base their own contributions on my work so far.
More likely that the maintainers may reappear, provide feedback, and/or figure out a way to get the changes into an acceptable state, or reject them.
Less likely for me to forget and re-implement it myself.

However: it can be frustrating sometimes if your PR doesn't get merged for a long time. Usually I ping the maintainers once every so often (say once per 3 months or so) if I think the PR is still valid -- but sometimes I just close them to reduce my mental overhead if it's been a long time with no progress.

mlduff added 2 commits April 16, 2024 10:48

Add bergamot

e1c610e

Check for time being non null

71c4648

mlduff mentioned this pull request Apr 16, 2024

Madewithlau.com recipe support #1015

Closed

Add to README

d91d392

jayaddison reviewed Apr 17, 2024

View reviewed changes

recipe_scrapers/__init__.py Outdated Show resolved Hide resolved

jayaddison reviewed Apr 17, 2024

View reviewed changes

recipe_scrapers/bergamot.py Outdated Show resolved Hide resolved

jayaddison reviewed Apr 17, 2024

View reviewed changes

recipe_scrapers/bergamot.py Show resolved Hide resolved

jayaddison reviewed Apr 17, 2024

View reviewed changes

mlduff and others added 4 commits April 18, 2024 07:57

Update recipe_scrapers/__init__.py

d4bc518

Co-authored-by: James Addison <[email protected]>

Update recipe_scrapers/bergamot.py

2dc375f

Co-authored-by: James Addison <[email protected]>

Use source domain as author

48baa93

Add cook time and tests

52058c2

jayaddison mentioned this pull request Apr 19, 2024

MarleySpoon: add precautionary check for unexpected API URLs. #1069

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bergamot #1064

Add bergamot #1064

mlduff commented Apr 16, 2024

jayaddison commented Apr 17, 2024 •

edited

Loading

jayaddison commented Apr 17, 2024

jayaddison Apr 17, 2024

mlduff commented Apr 17, 2024

jayaddison commented Apr 19, 2024

mlduff commented Apr 19, 2024

jayaddison commented Apr 19, 2024

mlduff commented Apr 20, 2024

mlduff commented Apr 25, 2024

jayaddison commented Apr 25, 2024

mlduff commented May 4, 2024

jayaddison commented May 6, 2024

Add bergamot #1064

Are you sure you want to change the base?

Add bergamot #1064

Conversation

mlduff commented Apr 16, 2024

jayaddison commented Apr 17, 2024 • edited Loading

jayaddison commented Apr 17, 2024

jayaddison Apr 17, 2024

Choose a reason for hiding this comment

mlduff commented Apr 17, 2024

jayaddison commented Apr 19, 2024

mlduff commented Apr 19, 2024

jayaddison commented Apr 19, 2024

mlduff commented Apr 20, 2024

mlduff commented Apr 25, 2024

jayaddison commented Apr 25, 2024

mlduff commented May 4, 2024

jayaddison commented May 6, 2024

jayaddison commented Apr 17, 2024 •

edited

Loading