-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Add] Plötzblog Scraper #1062
Comments
Ok, that didn't work (makes sense, I suppose). Perhaps issues can only be assigned to people who have interacted with the thread themselves (not by others on their behalf). |
Hopefully now it works @jayaddison |
@jayaddison I have an issue with one of the test cases when implementing this site. The Here is a snapshot of my unit test with the expected results: "ingredients": [
"558 g Weizenmehl 550",
"389 g Wasser",
"90 g Weizenanstellgut TA 200 (weich)",
"13 g Salz"
],
"ingredient_groups": [
{
"ingredients": [
"90 g Wasser",
"90 g Weizenmehl 550",
"90 g Weizenanstellgut TA 200 (weich)"
],
"purpose": "Weizensauerteig"
},
{
"ingredients": [
"13 g Salz",
"298 g Wasser",
"467 g Weizenmehl 550",
"gesamter Weizensauerteig"
],
"purpose": "Hauptteig"
}
], As far as I can tell, the options I have are:
I read https://github.com/hhursev/recipe-scrapers/blob/main/docs/in-depth-guide-ingredient-groups.md, and in particular this excerpt:
I think that in my case if we kept my current output as is, it would still be satisfying this requirement as it is still the same ingredients, just separated out. Therefore, I think the best solution is option 3, however I would like to get your input on if that is the best option/desired output, and also how I could tackle this. Scraped page: https://www.ploetzblog.de/rezepte/mildes-weizensauerteigbrot/id=619f68b528ae7154616ab768 |
Interesting - thanks @mlduff! In a way I'm surprised that we haven't encountered this problem before already. To check: is is this part of the test logic that is failing: recipe-scrapers/tests/__init__.py Lines 113 to 117 in 3273fab
|
That is correct @jayaddison |
Ok, thanks @mlduff. I agree that option three (handling this case within test coverage) would be ideal. However, I can't initially figure out how we'd handle this in out tests. Are you suggesting trying to parse and compare the quantities in the ingredient lines (grouped vs total), or to override the test for this individual case - or pehaps something else? |
@jayaddison I'm not entirely sure - ideally we would be able to parse the quantities (or even if you could just separate the quantities out and check that the same ingredients are mentioned), however I'm not sure of a robust way of doing this without perhaps using an external library as a dev-dependency (e.g. ingredient-parser). If this would be inappropriate/too hard, being able to mark in some way (is it possible to annotate or something like that?) that this particular test shouldn't be run on this particular file that would be good. |
Mmm, ok. I'm continuing to think about it. If we had one Python class per scraper unit test, then perhaps what we would naturally do here would be to override the logic that checks for this ingredient-group-ingredient-list equivalence. My sense is that something similar could be appropriate here. There's definitely possible value in adding a dev-dependency and potentially contributing improvements back there as a result of what we find; but there are also benefits in keeping developer dependencies lightweight. |
@jayaddison I could move this test to the legacy tests so I can override if you like? Otherwise could we look at doing a non-legacy similar approach? I'm also happy to investigate trying to parse the ingredients with the dev-dependencies if you would like me to. |
@mlduff please feel free to explore any possibilities; all of those options seem like they'd provide ways to learn more about the problem and possible ways to resolve it - including the related advantages and disadvantages of each. At the moment I don't have any further recommendations, but I'll continue thinking about it. It's not always the case, but sometimes taking a week or two to consider a problem can help to develop different ways of thinking about it that ultimately help to figure it out properly -- or at least help you to decide what the best option is from those available. |
@jayaddison I have just submitted the PR for this - I went for option 4, and devised a way to exempt this scraper from that particular test. If I get time I'll further investigate ingredients-parser, however this seems to be like there still could be edge cases it falls over in (its a model that spits out confidences and the like), so I thought it probably isn't suitable for a unit test. |
This does seem like a good approach, if possible. It would mean that we don't have to add an override option / subclass the test / provide alternative implementations per-scraper. It seems there are at least two libraries available:
As you say, I do think that these should be dev-dependencies only for this use-case; let's prove the functionality and figure out a library that we like for test/internal purposes, and then perhaps in future we could provide parsed quantities on ingredients. |
A popular german bread baking recipe website with no schema but very nice recipes
https://www.ploetzblog.de/rezepte/rustikales-weizenmischbrot/id=61b843e6f51707630ab85911
https://www.ploetzblog.de/rezepte/hafer-dinkel-brot/id=618a284828ae7154616ab156
https://www.ploetzblog.de/rezepte/mildes-weizensauerteigbrot/id=619f68b528ae7154616ab768
The text was updated successfully, but these errors were encountered: