Second Hand Auto/Moto Scanner for OLX

Note

Beginner Learning Project focused on web scraping skills and foundational scraping logic

Second Hand Auto/Moto Scanner for OLX

Technology Stack

Python 3.11
Selenium
WebDriver Manager
BeautifulSoup4
Pandas

Description

This project is an initial iteration of a Python-based web scraper designed to gather data on used car and motorcycle listings from OLX Portugal (OLX.pt). It leverages modern web scraping techniques t[...]

Key Features & Approach Highlights

Multi-Page Scraping: Effectively navigates through multiple search result pages using URL parameter manipulation (?page=N), demonstrating pagination handling.
Efficient Browser Automation: Utilizes Selenium and WebDriver Manager to control a Chrome instance, crucially reusing a single browser session across multiple page requests for improved efficien[...]
Robust Waiting Strategy: Implements Selenium's WebDriverWait to intelligently wait for essential page elements (like the main listing grid) to load, minimizing errors caused by fixed delays an[...]
Adaptive Data Extraction: Employs BeautifulSoup4 with CSS selectors (prioritizing data-testid attributes where available, with class-based fallbacks) to parse the rendered HTML and extract key[...]
Structured Output: Organizes scraped data using Pandas DataFrames and saves the results to a CSV file for easy access and further analysis.
Forward-Thinking Design: Includes analysis of OLX.pt's URL structure, identifying parameters for filtering (price, year, fuel type, etc.) and sorting (price, date). This lays the groundwork for [...]

Challenges & Problem-Solving Showcase

Developing this scraper involved overcoming challenges common to scraping large commercial platforms:

Initial Tooling/Environment Compatibility: Early exploration involved evaluating different browser automation tools (like Playwright) within specific server environments on Windows. Encountere[...]
Dynamic Content & Structure Variations: OLX.pt required browser automation (Selenium) to ensure all listing content was loaded correctly. Variations in HTML structure between different ad type[...]
Efficient Pagination: Dynamically detecting the total number of pages proved unreliable due to how controls were rendered. Solution: Adopted a pragmatic approach using a configurable page [...]

Current Status & Next Steps

Status: Initial working version. Successfully scrapes core data fields from a defined number of pages within the OLX.pt cars category. Saves combined data to CSV.
Next Steps:
- Implement dynamic URL generation based on user-defined filters and sorting options.
- Add comprehensive data cleaning and parsing logic (e.g., numeric price, year/km extraction, date parsing).
- Refine selectors further for edge cases.
- Integrate database storage for persistent data and tracking changes.
- Develop logic for more efficient scraping runs (e.g., only fetching newest ads).

Setup & Usage

Clone the repository.
Navigate into the project directory (autoscannerpt).
Create and activate a Python virtual environment (e.g., python -m venv myenv, myenv\Scripts\activate).
Install dependencies: pip install -r requirements.txt
Run the scraper: python src/autoscannerpt/scraper.py
- (Modify configuration constants like MAX_PAGES_TO_SCRAPE directly in src/autoscannerpt/scraper.py for now).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
docs		docs
src/autoscannerpt		src/autoscannerpt
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
ruff.toml		ruff.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Second Hand Auto/Moto Scanner for OLX

Technology Stack

Description

Key Features & Approach Highlights

Challenges & Problem-Solving Showcase

Current Status & Next Steps

Setup & Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

noanchovies/BarnFinder

Folders and files

Latest commit

History

Repository files navigation

Second Hand Auto/Moto Scanner for OLX

Technology Stack

Description

Key Features & Approach Highlights

Challenges & Problem-Solving Showcase

Current Status & Next Steps

Setup & Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages