This is a Python script that captures screenshots of web pages listed in a CSV file concurrently using Selenium WebDriver. It is designed to efficiently handle a large number of URLs and capture screenshots in parallel, making it suitable for tasks like website monitoring, testing, or generating previews.
- Concurrent Execution: Utilizes Python's ThreadPoolExecutor for concurrent execution of screenshot capture tasks, enhancing performance.
- Flexible Configuration: Easily configurable via the provided CSV file containing URLs to capture screenshots of.
- Headless Mode: Supports headless execution, allowing it to run in the background without launching a browser window.
- Customizable Naming: Screenshots are named based on the domain name and timestamp, ensuring uniqueness and easy identification.
- Python 3.x
- Chrome WebDriver
- Selenium
- Pandas
- Webdriver Manager
-
Clone the repository:
git clone https://github.com/pushkarsingh32/url_to_screenshot.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Make sure to have Chrome WebDriver installed. If not, it can be automatically installed using the Webdriver Manager.
-
Prepare a CSV file containing a list of URLs under a column named "URL". Ensure there are no duplicate or empty URLs.
-
Run the script by executing the following command:
python screenshot_tool.py
-
Screenshots will be saved in the specified directory (
./data/screenshots/
) with filenames in the format<domain_name>_screenshot_at_<timestamp>.png
.
To demonstrate how to use this tool, an example CSV file containing URLs (urls_data.csv
) is provided in the data/urls/
directory. You can use this file to run the script and capture screenshots of the listed web pages.
This project is licensed under the MIT License - see the LICENSE file for details.