Beverage Sales Data Engineering Project

Overview

This project is a cloud-based data engineering pipeline for analyzing beverage sales. The pipeline ingests raw sales data, transforms it using dbt, and orchestrates workflows with Kestra. The processed data is stored in Google BigQuery, and insights are visualized through Looker Studio dashboards.

Tech Stack

Orchestration: Kestra (workflow automation)
Transformations: dbt (Data Build Tool) (data modeling and transformations)
Cloud Platform: Google Cloud Platform (GCP)
Infrastructure as Code: Terraform
Visualization: Looker Studio

Project Structure

.
├── README.md           # Project documentation
├── .env                # Environment variables (update as needed)
├── .env.example        # Example environment file
├── dashboard           # Looker Studio dashboards
├── dbt                 # dbt transformation logic
│   ├── beverage_sales  # dbt project directory
│   │   ├── models      # Core, marts, and staging models
│   │   ├── macros      # Custom dbt macros
│   │   ├── tests       # dbt tests
│   │   ├── seeds       # Seed data
│   │   ├── snapshots   # Snapshot tables
│   │   ├── dbt_project.yml # dbt project config
├── docker              # Docker configuration
│   └── docker-compose.yml # Services setup
├── kestra              # Kestra workflows for orchestration
│   ├── config.yml      # Kestra configuration
│   ├── flows           # Workflow definitions
│   └── data            # Ingested data
└── terraform           # Infrastructure as code
    ├── main.tf         # Terraform configuration
    ├── variables.tf    # Terraform variables

Data Pipeline Workflow

Data Ingestion
- Raw CSV data is stored in Google Cloud Storage (GCS).
- Kestra orchestrates ingestion into an external BigQuery table.
Transformations with dbt
- The data is cleaned, modeled, and enriched into core fact and dimension tables.
- Analytical tables include customer insights, sales performance, seasonal trends, and top products.
Orchestration with Kestra
- Automates data loading, transformations, and scheduled runs.
Visualization in Looker Studio
- Data is presented in interactive dashboards for analysis.

Looker Studio Dashboards

Customer Analysis Dashboard

Sales Performance Dashboard

Top Products Dashboard

Seasonal Trends Dashboard

Deployment & Setup

Prerequisites

Google Cloud project with BigQuery and Cloud Storage enabled.
Terraform installed (brew install terraform or download).
Docker installed (brew install docker or Docker Desktop).

Setup Instructions

Clone the repository:

git clone https://github.com/Sharonsyra/beverage-sales-data-engineering-project
cd beverage-sales-data-engineering-project

Make a copy of the .env.example file and rename it to .env:
```
cp .env.example .env
```
Fill .env with your Google Cloud credentials.

Run this script to set up secrets for Kestra:

x=23  # Change this to the line number you want to start from

awk "NR >= $x" .env | while IFS='=' read -r key value; do
    echo "SECRET_$key=$(echo -n "$value" | base64)"
done >> .env  # Append to the existing .env file

Set Environment Variables:

set -o allexport; source .env; set +o allexport

Deploy infrastructure with Terraform:

cd terraform
terraform init
terraform apply

Start Docker Services:

cd docker
docker-compose --env-file ../.env up --build

Updating staging/schema.yml

Before running dbt, you must edit dbt/beverage_sales/models/staging/schema.yml with your GCP details. Update the following section:

sources:
  - name: staging
    database: <YOUR_GCP_PROJECT_ID>  # Replace with your actual GCP project ID
    schema: <YOUR_GCP_DATASET>        # Replace with your actual BigQuery dataset

    tables:
      - name: <YOUR_DATA_TABLE_NAME>  # Replace with your actual table name

Make sure the values match those in your .env file.

Run dbt transformations:

cd dbt/beverage_sales
dbt compile
dbt run

Access dashboards in Looker Studio (links above).

Adding GCP Credentials to Kestra's Keystore

To securely store your Google Cloud Service Account JSON, follow these steps:

Open the Kestra UI at http://localhost:8080.
Navigate to Namespaces → Keystore.
Select the zoomcamp namespace.
Click KV Store and enter the following:
- Key: GCP_CREDS
- Type: JSON
- Value: Paste the contents of your service-account.json file.
Click Save.

Once added, the service account credentials will be securely accessible inside Kestra workflows.

Running Kestra Flows

Load Kestra KV Store
- Open Kestra UI at http://localhost:8080.
- Navigate to Flows.
- Select zoomcamp.gcp_kv.
- Click the Execute button (top-right).
- Ensure GCS_CREDS is set as pointed out in the instructions above.
Ingest Data
- Open Kestra UI at http://localhost:8080.
- Navigate to Flows.
- Select zoomcamp.gcp_ingest_and_load.
- Click the Execute button (top-right).
Run dbt Transformations
- Open Kestra UI at http://localhost:8080.
- Navigate to Flows.
- Select zoomcamp.gcp_dbt.
- Click the Execute button (top-right).
- This only runs on my gcp instance given it is syncing from github, just an extra flow.

Kestra Images

gcp_dbt_flow

gcp_ingest_and_load

gcp_kv

zoomcamp_kv_store

Future Improvements

Implement data quality checks using Great Expectations.
Optimize cost efficiency in BigQuery storage.
Add real-time streaming with Pub/Sub and Dataflow.

Contributors

Sharon Waithîra – GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Beverage Sales Data Engineering Project

Overview

Tech Stack

Project Structure

Data Pipeline Workflow

Looker Studio Dashboards

Deployment & Setup

Prerequisites

Setup Instructions

Adding GCP Credentials to Kestra's Keystore

Running Kestra Flows

Kestra Images

Future Improvements

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dashboards/images		dashboards/images
dbt/beverage_sales		dbt/beverage_sales
docker		docker
kestra		kestra
terraform		terraform
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

Sharonsyra/beverage-sales-data-engineering-project

Folders and files

Latest commit

History

Repository files navigation

Beverage Sales Data Engineering Project

Overview

Tech Stack

Project Structure

Data Pipeline Workflow

Looker Studio Dashboards

Deployment & Setup

Prerequisites

Setup Instructions

Adding GCP Credentials to Kestra's Keystore

Running Kestra Flows

Kestra Images

Future Improvements

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages