Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ottolenghi scraper #1209

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,7 @@ Scrapers available for:
- `https://omnivorescookbook.com <https://omnivorescookbook.com>`_
- `https://www.onceuponachef.com <https://www.onceuponachef.com>`_
- `https://onesweetappetite.com/ <https://onesweetappetite.com>`_
- `https://ottolenghi.co.uk/ <https://ottolenghi.co.uk>`_
- `https://owen-han.com/ <https://owen-han.com>`_
- `https://www.paleorunningmomma.com/ <https://www.paleorunningmomma.com>`_
- `https://www.panelinha.com.br/ <https://www.panelinha.com.br>`_
Expand Down
3 changes: 3 additions & 0 deletions recipe_scrapers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from __future__ import annotations

from .ottolenghi import Ottolenghi

__all__ = (
"AbstractScraper",
"ElementNotFoundInHtml",
Expand Down Expand Up @@ -505,6 +507,7 @@
NotEnoughCinnamon.host(): NotEnoughCinnamon,
NutritionFacts.host(): NutritionFacts,
OneSweetAppetite.host(): OneSweetAppetite,
Ottolenghi.host(): Ottolenghi,
OttolenghiBooks.host(): OttolenghiBooks,
PeelWithZeal.host(): PeelWithZeal,
PinchOfYum.host(): PinchOfYum,
Expand Down
69 changes: 69 additions & 0 deletions recipe_scrapers/ottolenghi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
from ._abstract import AbstractScraper
from ._grouping_utils import group_ingredients

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from ._utils import get_minutes, get_yields


class Ottolenghi(AbstractScraper):
@classmethod
def host(cls):
return "ottolenghi.co.uk"

def author(self):
return self.schema.author()

def title(self):
return self.schema.title()

def category(self):
return self.schema.category()

def total_time(self):
return self.schema.total_time()

def image(self):
return self.soup.find("div", class_="c-recipe-header__gallery").find("img")[
"src"
]

def ingredients(self):
return self.schema.ingredients()

def ingredient_groups(self):
return group_ingredients(
self.ingredients(),
self.soup,
".c-recipe-ingredients__heading",
".c-recipe-ingredients tr:not(:has(.c-recipe-ingredients__heading))",
)

def instructions(self):
return self.schema.instructions()

def ratings(self):
return self.schema.ratings()

def cuisine(self):
return self.schema.cuisine()

def description(self):
return self.schema.title()

def yields(self):
return (
self.soup.find("div", class_="c-recipe-header__timings")
.find("span")
.get_text(strip=True)
)

def prep_time(self):
return (
self.soup.find("div", class_="c-recipe-header__timings")
.find_all("span")[1]
.get_text(strip=True)
)

def cook_time(self):
return (
self.soup.find("div", class_="c-recipe-header__timings")
.find_all("span")[2]
.get_text(strip=True)
)
Comment on lines +50 to +69
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def yields(self):
return (
self.soup.find("div", class_="c-recipe-header__timings")
.find("span")
.get_text(strip=True)
)
def prep_time(self):
return (
self.soup.find("div", class_="c-recipe-header__timings")
.find_all("span")[1]
.get_text(strip=True)
)
def cook_time(self):
return (
self.soup.find("div", class_="c-recipe-header__timings")
.find_all("span")[2]
.get_text(strip=True)
)
def _extract_timing_elements(self, prefix):
timings = self.soup.find("div", class_="c-recipe-header__timings").find_all("span")
for timing in timings:
if prefix.lower() in timing.get_text().lower():
return timing.get_text()
def yields(self):
yield_text = self._extract_timing_elements("serves")
return get_yields(yield_text)
def prep_time(self):
prep_text = self._extract_timing_elements("prep")
return get_minutes(prep_text)
def cook_time(self):
cook_text = self._extract_timing_elements("cook")
if cook_text and '5o' in cook_text:
cook_text = cook_text.replace('5o', '50')
return get_minutes(cook_text)

What do you think about refactoring this to use a shared helper and using the starting text as the matching parameter instead of position?

Also I implemented two existing utils get_minutes and get_yields to normalize the fields outputs. this will require some changes to the test JSONs

Copy link
Collaborator

@jknndy jknndy Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in cook_time I added coverage for an error on the recipe page where 50 is displayed as 5o

60 changes: 60 additions & 0 deletions tests/test_data/ottolenghi.co.uk/ottolenghi_1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
{
"author": "Ottolenghi",
"canonical_url": "https://ottolenghi.co.uk/pages/recipes/jammy-croissant-strawberry-sundae",
"site_name": "Ottolenghi",
"host": "ottolenghi.co.uk",
"language": "en",
"title": "Jammy croissant strawberry sundae",
"ingredients": [
"shop bought strawberry ice cream",
"unsweetened lightly whipped cream",
"sumac",
"400g strawberries, hulled and halved (quartered if large)",
"50g caster sugar",
"2 tsp lime juice",
"150g stale croissants, roughly blitzed in a food processor (or chopped by hand)",
"60g unsalted butter, melted",
"60g caster sugar",
"¾ tsp flaked sea salt"
],
"ingredient_groups": [
{
"ingredients": [
"shop bought strawberry ice cream",
"unsweetened lightly whipped cream",
"sumac"
],
"purpose": null
},
{
"ingredients": [
"400g strawberries, hulled and halved (quartered if large)",
"50g caster sugar",
"2 tsp lime juice"
],
"purpose": "QUICK ROASTED STRAWBERRIES"
},
{
"ingredients": [
"150g stale croissants, roughly blitzed in a food processor (or chopped by hand)",
"60g unsalted butter, melted",
"60g caster sugar",
"¾ tsp flaked sea salt"
],
"purpose": "“CRANKO”"
}
],
"instructions": "For the strawberries: Preheat oven to 200c and line a roasting tin with parchment paper.\nCombine the strawberries, sugar and lime juice in the roasting tin and bake for 15-18 minutes, stirring halfway through, until the juices are released and the strawberries are super soft but still holding their shape. Set aside to cool. They will keep for up to two days in the fridge.\nFor the Cranko: Preheat the oven to 150c and line a baking tray with parchment paper.\nCombine everything together in a medium bowl and mix to combine. Transfer onto the baking tray and bake for 25-30 minutes until golden brown and crispy, stirring a couple of times throughout the baking time. Set aside to cool. Store in an airtight container for up to two weeks.",
"instructions_list": [
"For the strawberries: Preheat oven to 200c and line a roasting tin with parchment paper.",
"Combine the strawberries, sugar and lime juice in the roasting tin and bake for 15-18 minutes, stirring halfway through, until the juices are released and the strawberries are super soft but still holding their shape. Set aside to cool. They will keep for up to two days in the fridge.",
"For the Cranko: Preheat the oven to 150c and line a baking tray with parchment paper.",
"Combine everything together in a medium bowl and mix to combine. Transfer onto the baking tray and bake for 25-30 minutes until golden brown and crispy, stirring a couple of times throughout the baking time. Set aside to cool. Store in an airtight container for up to two weeks."
],
"category": "Desserts",
"yields": "Serves 4",
"description": "Jammy croissant strawberry sundae",
"cook_time": "Cook 5o min",
"prep_time": "Prep 10 min",
"image": "//ottolenghi.co.uk/cdn/shop/files/Jammy_croissant_strawberry_sundae.jpg?v=1723375810&width=1000"
}
7,732 changes: 7,732 additions & 0 deletions tests/test_data/ottolenghi.co.uk/ottolenghi_1.testhtml

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions tests/test_data/ottolenghi.co.uk/ottolenghi_2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{
"author": "Ottolenghi",
"canonical_url": "https://ottolenghi.co.uk/pages/recipes/polenta-sardines-coriander-salsa",
"site_name": "Ottolenghi",
"host": "ottolenghi.co.uk",
"language": "en",
"title": "Polenta with sardines and coriander salsa",
"ingredients": [
"200g quick cook polenta",
"195ml olive oil",
"2 x 120g tins sardines in sunflower oil",
"90g spring onions, finely chopped",
"5 cloves garlic, crushed",
"4 tbsp tomato paste",
"1 x 340g tin sweetcorn, strained",
"5g ginger, peeled and finely grated",
"½ tsp ground cumin",
"2 tsp aleppo chilli",
"1 lime juiced to get 1 tbsp",
"25g coriander leaves, finely chopped",
"fine sea salt and black pepper"
],
"ingredient_groups": [
{
"ingredients": [
"200g quick cook polenta",
"195ml olive oil",
"2 x 120g tins sardines in sunflower oil",
"90g spring onions, finely chopped",
"5 cloves garlic, crushed",
"4 tbsp tomato paste",
"1 x 340g tin sweetcorn, strained",
"5g ginger, peeled and finely grated",
"½ tsp ground cumin",
"2 tsp aleppo chilli",
"1 lime juiced to get 1 tbsp",
"25g coriander leaves, finely chopped",
"fine sea salt and black pepper"
],
"purpose": null
}
],
"instructions": "Add 1 ½ litres water and 1 teaspoon of salt to a medium saucepan and bring to the boil on a medium high heat. Pour the polenta into the boiling water, whisking constantly, and cook for 5 minutes, reducing the heat if the polenta starts to spit. Once the mixture has thickened to a loose porridge consistency, turn down to the lowest heat and cover with a lid or a piece of parchment.\nHeat 120 millilitres oil, the sardines, spring onions, garlic, tomato paste and 3/4 teaspoon salt in a medium saute pan over medium high heat for 10 minutes until the onions have softened and the sardines have broken down. Add the sweetcorn, ginger, cumin and chilli and cook for another 5 minutes, until the spices are fragrant and the sauce has visibly darkened in colour.\nMeanwhile, mix the remaining 30 grams of spring onions and 75 millilitres of oil in a small bowl with the coriander, lime, and ⅛ teaspoon salt.\nTo serve, ladle the polenta into four shallow bowls and top with the sardine mixture and the herb oil.",
"instructions_list": [
"Add 1 ½ litres water and 1 teaspoon of salt to a medium saucepan and bring to the boil on a medium high heat. Pour the polenta into the boiling water, whisking constantly, and cook for 5 minutes, reducing the heat if the polenta starts to spit. Once the mixture has thickened to a loose porridge consistency, turn down to the lowest heat and cover with a lid or a piece of parchment.",
"Heat 120 millilitres oil, the sardines, spring onions, garlic, tomato paste and 3/4 teaspoon salt in a medium saute pan over medium high heat for 10 minutes until the onions have softened and the sardines have broken down. Add the sweetcorn, ginger, cumin and chilli and cook for another 5 minutes, until the spices are fragrant and the sauce has visibly darkened in colour.",
"Meanwhile, mix the remaining 30 grams of spring onions and 75 millilitres of oil in a small bowl with the coriander, lime, and ⅛ teaspoon salt.",
"To serve, ladle the polenta into four shallow bowls and top with the sardine mixture and the herb oil."
],
"category": "Mains",
"yields": "Serves 4",
"description": "Polenta with sardines and coriander salsa",
"cook_time": "Cook 25 min",
"prep_time": "Prep 5 min",
"image": "//ottolenghi.co.uk/cdn/shop/files/Polenta_with_sardines_and_coriander_salsa.jpg?v=1725879288&width=1000"
}
7,800 changes: 7,800 additions & 0 deletions tests/test_data/ottolenghi.co.uk/ottolenghi_2.testhtml

Large diffs are not rendered by default.

Loading