Open Parliament Romania

Automated scraper for collecting data from the Romanian Parliament. Currently focuses on the Chamber of Deputies (Camera Deputaților) with plans to expand to the full Parliament.

Features

Automated data collection for deputies, speeches, motions, interpellations, and proposals
Caching to minimize server load
Flexible CLI for running specific scrapers or all at once
Progress tracking for long-running jobs
Automated updates via GitHub Actions, no need for a dedicated server
Data packaging following datapackage.org standards for better interoperability

Installation

bun install

Usage

# Run all scrapers
bun scrape --all

# Run specific scrapers
bun scrape --deputies --deputies_detail

# Enable verbose logging
bun scrape --verbose --all

# View available scrapers
bun scrape

Data structure

All scraped data is stored in the data/ directory organized by year:

data/
├── 2024/                          # Year when the parliament was elected
│   ├── deputies.json              # List of all deputies
│   ├── full-deputies/             # Detailed deputy profiles
│   ├── speeches/                  # Deputy speeches
│   ├── motions/                   # Deputy motions
│   ├── interpellations/           # Deputy interpellations
│   └── proposals/                 # Deputy proposals

Automated updates

The scraper runs every 3 hours via GitHub Actions. Safe changes (additions and modifications) are committed directly to main, while potentially destructive changes (deletions) create pull requests for review. Since all data is stored in git, you can easily revert to any previous state if scrapers break due to website changes.

Development

Project structure

src/
├── jobs/                          # Scraper job definitions
├── lib/
│   ├── scrapers/                  # Core scraping logic
│   ├── cache.ts                   # Caching utilities
│   ├── log.ts                     # Logging configuration
│   └── runScraper.ts              # Scraper execution

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

Adding new scrapers

Create a new job file in src/jobs/
Implement the ScraperJob interface
The scraper will automatically be discovered and available via CLI
Important: If you change data formats, increment the scraper version number
Important: If you scrape new data, ensure it is properly documented in the README.md and DATA_LICENSE.md

Usage notes

The scraper includes delays and caching to be respectful to target websites
Always verify critical information from official sources
Website changes may break scrapers - git history allows easy rollback if needed

Roadmap

Essential features for first release candidate

Data completeness - Include Senate data from senat.ro
Include votes - Parliamentary voting records and outcomes
Include other entities - Groups of friendship (Grupuri de prietenie) and other parliamentary entities
Potentially include transcripts - Full session transcripts (pending size/storage considerations)
Standardize data format - Define JSON schemas and eliminate ID recycling issues (deputies currently get IDs 1..N that are recycled each election)
Parliamentary profile completeness - Contact information, resumes, and other important biographical data

Out of scope for first release

Data enrichment and cleanup - Using LLMs or other AI tools for data enhancement
Standardizing complex resources - Some resources, such as legislative proposals, have complex structures that are out of scope for the first release
Extracting data from scanned documents - OCR and document parsing capabilities

License

This project uses dual licensing:

Code: MIT License - Use, modify, and distribute the scraper code freely
Data: CC BY 4.0 - Use the scraped data with attribution

The scraped data comes with important disclaimers about accuracy and completeness. See DATA_LICENSE.md for full details.

Made with ❤️ in Romania

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
data		data
src		src
.gitignore		.gitignore
.npmrc		.npmrc
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DATA_LICENSE.md		DATA_LICENSE.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
index.ts		index.ts
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Parliament Romania

Features

Installation

Usage

Data structure

Automated updates

Development

Project structure

Contributing

Adding new scrapers

Usage notes

Roadmap

Essential features for first release candidate

Out of scope for first release

License

About

Uh oh!

Releases

Packages

Languages

License

ClaudiuCeia/open-parliament-ro

Folders and files

Latest commit

History

Repository files navigation

Open Parliament Romania

Features

Installation

Usage

Data structure

Automated updates

Development

Project structure

Contributing

Adding new scrapers

Usage notes

Roadmap

Essential features for first release candidate

Out of scope for first release

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages