A simplified dbt project for transforming Flight raw data into a ready-to-query data warehouse. It ingests XLSX/JSON files from S3, cleans and models them via dbt, and outputs analytics-friendly tables. This setup uses GitHub for version control and fosters a scalable, testable data pipeline.
- Data Sources: XLSX/JSON in S3
- Destination: CSV/Parquet in S3
- Requirements: GitHub, AWS S3 Bucket, AWS Keys
Consistent schema and reference constraints ensure reliable data. Each dataset is validated before analytics.
- Fig 1: Star schema with fact table and dimensions*
transform/
├── macros/ # macros
├── models/ # Core dbt transformation logic
│ ├── staging/ # Initial data cleaning
│ └── marts/ # Final presentation tables
├── tests/ # Data quality tests
├── dbt_project.yml # dbt configuration
└── profiles.yml # profile
- Prerequisites : Python 3.12, S3 Bucket, S3 keys
git clone https://github.com/e-espootin/flightflow-dbt-analytics
cd flightflow-dbt-analytics
uv venv --python 3.12
source .venv/bin/activate
cp .env.template .env
make build
A sample Tableau Public analysis: Tableau public