Data Engineering Platform for Pass Culture on Google Cloud Platform (GCP)
This repository contains the core components of our data platform:
- Airflow DAGs for workflow orchestration
- DBT models for data transformation
- ML models for machine learning services
- ETL jobs for data processing
- Project Overview - Main data models, glossary, and technical references
- Orchestration Guide - Airflow DAGs documentation
- CI/CD Documentation - Deployment and pipeline details
+-- orchestration
| +-- dags
| +-- dependencies
| +-- jobs
| +-- data_gcp_dbt
+-- jobs
| +-- etl_jobs
| +-- external
| +-- ...
| +-- internal
| +-- ...
| +-- ml_jobs
| +-- ...
- Google Cloud CLI
- Access to our GCP service accounts
- Make installed
- Linux:
sudo apt install make - macOS:
brew install make
- Linux:
- Install the prerequisites
- Linux:
make install_ubuntu_libs - Mac:
make install_macos_libs
- Linux:
-
Clone the repository
git clone [email protected]:pass-culture/data-gcp.git cd data-gcp
-
Install the project
make install
This installation includes all necessary requirements for the
orchestrationpart in a single virtual environment and sets up pre-commit hooks for code quality.
If you have MySQL client related issues when installing dependencies, you might need to set the following environment variables.
Add to your ~/.zshrc:
export MYSQLCLIENT_LDFLAGS="-L/opt/homebrew/opt/mysql-client/lib -lmysqlclient -rpath /usr/local/mysql/lib"
export MYSQLCLIENT_CFLAGS="-I/opt/homebrew/opt/mysql-client/include -I/opt/homebrew/opt/mysql-client/include/mysql"MS_NAME=my_microservice make create_microservice_mlMS_NAME=my_microservice make create_microservice_etl_internalMS_NAME=my_microservice make create_microservice_etl_externaluv sync --group <airflow|dbt|dev|docs>make ruff_fix / ruff_check / sqlfluff_fix / sqlfluff_check / sqlfmt_fix / sqlfmt_checkuv allows to manage dependencies with a lock file. However the lock file is not really easy to read. You can generate a human readable file by uv.lock with:
python automations/export_requirements.py export-requirementsor with a prefix
python automations/export_requirements.py export-requirements --prefix "new_"python automations/export_requirements.py diff-requirements --branch1 {first_branch} --branch2 {second_branch}or
python automations/export_requirements.py diff-requirements --branch1 {first_branch} --branch2 {second_branch} --write-to-fileto write the output to a file named package_versions.diff
Example :
python automations/export_requirements.py diff-requirements --branch1 master --branch2 refactor/remove-hardcoded-deps-in-pyproject.toml --write-to-fileThis will generate a file package_versions.diff with the diff of the requirements between the two branches.
Our CI/CD pipelines are managed through GitHub Actions. See the workflows documentation for details.
- Create a new branch for your feature
- Make your changes
- Submit a pull request
This project is licensed under the Mozilla Public License Version 2.0 - see the LICENSE file for details.