Scraper de Discours en Français - French Speech Scraper

Purpose

The purpose of discours_scraper.py is to gather in the form of a CSV file the recent speeches delivered by the French government from the following link: speech

The result is extracted on the basis of the following columns: titre,date,discours

You can choose the number of pages to be scraped by filling the class variables pages_begin and pages_end. It is quite pratical when your internet connection breaks during the scrapping.

Update

08/06/2023: I remark that some of the speeches contains <br> and not <p/>. Thus I take all the inner html contents and remove the tags afterwards.

Required Dependencies

Selenium 4
ChromeDriverManager to avoid managing incompatibilities between the current version (114, on 8th June 2023) of Chrome and the driver version.

Warning

You may complicate your life if you run the notebook with colab. I have only tested it with JupyterLab :-)
The script is based, like all scrapers, on the architecture of a website. It is possible that this architecture changes, or that the css selectors need to be updated. Adjustments may therefore be necessary to collect the data.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
readme.md		readme.md
scrape_discours.ipynb		scrape_discours.ipynb
scraped_discours_page1,4-10.csv		scraped_discours_page1,4-10.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraper de Discours en Français - French Speech Scraper

Purpose

Update

Required Dependencies

Warning

About

Releases

Packages

Languages

sufianj/french_speech_scraper

Folders and files

Latest commit

History

Repository files navigation

Scraper de Discours en Français - French Speech Scraper

Purpose

Update

Required Dependencies

Warning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages