🛒 Grocery Sales Data Pipeline

🚨 NOTE: This project is hosted on a temporary GCP environment. The dashboards may become unavailable after the cloud credits expire. 🚨

📦 Overview

This project is an end-to-end data pipeline that ingests, transforms, and visualizes grocery sales data using:

Apache Airflow for orchestration
Apache Spark for transformations
Google Cloud Platform (GCS & BigQuery) for storage and querying
Looker Studio for visualization

📁 Project Structure

.
├── airflow/
│   └── dags/
│       ├── data_ingestion_gcs_dag.py
│       ├── spark_grocery_transformations_dag.py
├── credentials/
│   └── google_credentials.json
├── terraform/
│   ├── main.tf
│   └── variables.tf
├── .env.example
├── docker-compose.yml
├── Dockerfile
├── README.md

⚙️ Setup Instructions

1. Prerequisites

Docker installed
Terraform installed
GCP project with:
- BigQuery and GCS enabled
- A service account with the following roles:
  - BigQuery Admin
  - Storage Admin

2. Configure Environment Variables

🔐 Create `.env` File

cp .env.example .env

Then edit .env and populate with your actual values:

# Terraform
TF_VAR_project=your-gcp-project-id
TF_VAR_region=us-central1
TF_VAR_location=US
TF_VAR_zone=us-central1-a
TF_VAR_bq_dataset_name=grocery_sales_pipeline
TF_VAR_gcs_storage=your-gcs-bucket-name

# Airflow
AIRFLOW_PROJ_DIR=./airflow
AIRFLOW_UID=501

GCP_PROJECT_ID=your-gcp-project-id
GCP_GCS_BUCKET=your-gcs-bucket-name
BIGQUERY_DATASET=grocery_sales_pipeline

AIRFLOW_CONN_SPARK='{
  "conn_type": "spark",
  "host": "spark://spark-master",
  "port": 7077
}'

🔐 Add GCP Credentials

Place your GCP service account key in:

credentials/google_credentials.json

3. Provision GCP Resources (Terraform)

Navigate to the terraform folder and run:

cd terraform
terraform init
terraform apply

This will create:

GCS bucket for raw and processed data

BigQuery dataset for storing external and transformed tables

4. Launch Airflow + Spark (Docker)

docker-compose up

Once services are running, visit:

Airflow UI: http://localhost:8080

Login with the default credentials:

Username: airflow
Password: airflow

Monitor Spark Applications: http://localhost:8081

🚦 DAG Execution Order

data_ingestion_gcs_dag
- Downloads raw CSVs
- Uploads them to GCS
- Creates BigQuery external tables
spark_grocery_transformations_dag
- Reads from GCS
- Transforms with PySpark
- Writes Parquet + transformed tables into BigQuery

🛑 hello_spark_dag is for testing only — not part of the production pipeline.

📊 Looker Studio Dashboards

Connect to the following tables in BigQuery for visualization:

fact_sales

Suggested Charts

📈 Sales Dashboard

https://lookerstudio.google.com/s/vzCIMs1DR7k

📌 Screenshots of dashboards go here:

📜 License

MIT License

🙌 Credits

Made with ❤️ using open data, open source, and cloud tools.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
airflow		airflow
screenshots		screenshots
terraform		terraform
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛒 Grocery Sales Data Pipeline

📦 Overview

📁 Project Structure

⚙️ Setup Instructions

1. Prerequisites

2. Configure Environment Variables

🔐 Create `.env` File

🔐 Add GCP Credentials

3. Provision GCP Resources (Terraform)

4. Launch Airflow + Spark (Docker)

🚦 DAG Execution Order

📊 Looker Studio Dashboards

Suggested Charts

📈 Sales Dashboard

📜 License

🙌 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Sharonsyra/grocery-sales-pipeline

Folders and files

Latest commit

History

Repository files navigation

🛒 Grocery Sales Data Pipeline

📦 Overview

📁 Project Structure

⚙️ Setup Instructions

1. Prerequisites

2. Configure Environment Variables

🔐 Create .env File

🔐 Add GCP Credentials

3. Provision GCP Resources (Terraform)

4. Launch Airflow + Spark (Docker)

🚦 DAG Execution Order

📊 Looker Studio Dashboards

Suggested Charts

📈 Sales Dashboard

📜 License

🙌 Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

🔐 Create `.env` File

Packages