End-to-End ML Data Engineering Project with Snowflake

E-Commerce Customer Analytics & Churn Prediction Pipeline

Project Overview

Build a production-ready data engineering pipeline using Snowflake to process e-commerce data, create ML features, and deploy predictive models. This project demonstrates advanced Snowflake capabilities including Snowpark, ML UDFs, and real-time analytics.

Business Case: An e-commerce company needs to predict customer churn and optimize marketing campaigns using real-time transaction data.

Required Downloads & Setup

1. Snowflake Trial Account

Sign up at: https://signup.snowflake.com/
Choose: AWS | US East (N. Virginia) | Standard Edition (Free Tier)

2. Python Environment Setup

python3.10 -m venv snowflake_ml_env
snowflake_ml_env\Scripts\activate

pip install snowflake-connector-python
pip install snowflake-snowpark-python
pip install pandas numpy scikit-learn xgboost
pip install streamlit plotly faker

3. VS Code Extensions

Snowflake Extension
Python
Jupyter

Project Architecture

Raw Data (CSV) ➔ Snowflake Stage ➔ Raw Tables ➔
Transformed Tables ➔ Feature Store (Snowpark) ➔
ML Models (UDFs) ➔ Streamlit Dashboard

Phase 1: Data Generation & Ingestion

Generate synthetic users, products, and transactions using data_generator.py
Save to CSV and upload to Snowflake stage
Create raw tables in Snowflake and copy data from stage

Phase 2: Data Loading & Transformation

Use data_loader.py to PUT CSVs into Snowflake and populate raw tables
Use data_transformation.py (Snowpark) to create FEATURES.USER_FEATURES
Generate churn labels based on recent transaction activity

Phase 3: ML Model Development

Use model_training.py to train churn model on features using Random Forest or XGBoost
Evaluate with classification report and save model using joblib

Phase 4: Model Deployment

Deploy model as Snowflake UDF using deploy_model_udf.py
Register both predict_churn and predict_churn_probability functions
Test using SQL queries and create prediction view ML_MODELS.CUSTOMER_CHURN_PREDICTIONS

Phase 5: Analytics Dashboard

Use dashboard.py to build an interactive Streamlit dashboard
Visualize:
- Customer segmentation
- Revenue by category
- Churn analysis by segment
- List of high-risk customers

Phase 6: Automation

Schedule daily runs with automated_pipeline.py
Integrate feature generation, model retraining, and UDF updates

Testing & Validation

Check for duplicate records
Validate NULL values and range distributions
Confirm model UDF predictions are aligned with expected churn logic

Key Skills Demonstrated

Snowflake (Snowpark, UDFs, Data Warehousing)
Machine Learning (Feature Engineering, Deployment)
Data Engineering (ETL, Automation, Quality Checks)
Python Development (API, Streamlit, Joblib, Scheduling)
Cloud Platforms (Snowflake, AWS, Azure)
Analytics & BI (Dashboarding, KPIs, Visualization)

Expected Outcomes

100K+ transactions processed daily
89% model accuracy for churn prediction
Real-time scoring with Snowflake UDFs
Fully automated and tested pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ML_Model		ML_Model
Streamlit		Streamlit
data		data
data_generation		data_generation
.gitignore		.gitignore
README.md		README.md
automated_pipeline.py		automated_pipeline.py
churn_model.pkl		churn_model.pkl
data_transformation.py		data_transformation.py
improved_churn_model.pkl		improved_churn_model.pkl
snowflake_setup.py		snowflake_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End ML Data Engineering Project with Snowflake

E-Commerce Customer Analytics & Churn Prediction Pipeline

Project Overview

Required Downloads & Setup

1. Snowflake Trial Account

2. Python Environment Setup

3. VS Code Extensions

Project Architecture

Phase 1: Data Generation & Ingestion

Phase 2: Data Loading & Transformation

Phase 3: ML Model Development

Phase 4: Model Deployment

Phase 5: Analytics Dashboard

Phase 6: Automation

Testing & Validation

Key Skills Demonstrated

Expected Outcomes

About

Uh oh!

Languages

1AyaNabil1/E-Commerce_Pipeline

Folders and files

Latest commit

History

Repository files navigation

End-to-End ML Data Engineering Project with Snowflake

E-Commerce Customer Analytics & Churn Prediction Pipeline

Project Overview

Required Downloads & Setup

1. Snowflake Trial Account

2. Python Environment Setup

3. VS Code Extensions

Project Architecture

Phase 1: Data Generation & Ingestion

Phase 2: Data Loading & Transformation

Phase 3: ML Model Development

Phase 4: Model Deployment

Phase 5: Analytics Dashboard

Phase 6: Automation

Testing & Validation

Key Skills Demonstrated

Expected Outcomes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages