This project is a web scraping pipeline that extracts product data from Leroy Merlin’s website.
It uses a modular approach with dedicated functions to fetch product categories, subcategories, pages, and individual items, then saves the results in CSV format.
- Scrape product categories, subcategories, and paginated pages
- Extract items from each page using custom parsing functions
- Save results as structured CSV files inside the output/folder
- Skip already-scraped pages to avoid duplicates
This project works with Python 3.8+.
External dependencies must be installed with pip:
pip install requests beautifulsoup4├── main.py                  # Entry point
├── script/
│   ├── leroymerlin.py        # get_products, get_pages, get_items
│   ├── util.py               # save_csv, get_last_path_parts
├── credential.example.py     # Example API credential file
├── output/                  # Scraped CSV files (auto-generated)
└── README.md
git clone https://github.com/harivonyR/LeroyMerlyn_scrapingcd LeroyMerlyn_scrapingCopy the example file:
copy credential.example.py credential.pyOpen credential.py and paste your PILOTERR API KEY:
x_api_key = "paste your API key here !"pip install requests beautifulsoup4python main.py- The scraper automatically skips files that already exist in output/.
- If a subcategory has no pagination, the scraper moves on.