WebScraping is a Python-based tool that scrapes movie and series metadata (titles, posters, release dates, genres, IMDb ratings, descriptions, countries, trailer links) from uflix.to and stores it in a SQLite database. Ideal for developers and data enthusiasts for media cataloging or analysis.
- Extracts movie/series data: titles, posters, years, IMDb ratings, release dates, descriptions, countries, genres, trailers.
- Stores data in
movies_series.db
. - Downloads posters to
data/images/
. - Handles errors and logs to
logs/movies_series.log
. - Modular code with automated setup.
- Python 3.8+
- Git
- Virtual environment (recommended)
-
Clone the repo:
git clone https://github.com/ngir0003/webScraping.git cd webScraping
-
Set up virtual environment (optional):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Run setup script:
python setup_project.py
Installs dependencies, creates directories (
data/database/
,data/images/
,logs/
), and initializesmovies_series.db
.
-
Scrape data:
python get_movies_series.py
Scrapes data, stores it in
movies_series.db
, downloads posters, and logs tologs/movies_series.log
. -
View database: Use an SQLite client (e.g., DB Browser for SQLite) to query
data/database/movies_series.db
.
webScraping/
├── movies.py # Movie data scraper
├── series.py # Series data scraper
├── get_movies_series.py # Main script
├── db.py # Database management
├── utils.py # Utilities (e.g., image downloading)
├── logger.py # Logging setup
├── config.py # Configuration
├── setup_project.py # Setup script
├── requirements.txt # Dependencies
├── README.md # Documentation
└── data/
├── database/
│ └── movies_series.db
└── images/
├── movies/
└── series/
requests==2.32.3
beautifulsoup4==4.12.3
Install:
pip install -r requirements.txt
- Fork: https://github.com/ngir0003/webScraping.git
- Create branch:
git checkout -b feature/your-feature
- Commit:
git commit -m "Add feature"
- Push:
git push origin feature/your-feature
- Open a Pull Request.
Report bugs/features via GitHub Issues.
Licensed under the Apache License, Version 2.0.
Nicholas Girdlestone (ngir0003)
⭐ Star this repo if you find it useful!