Superstore Marketing Campaign Analysis

Project Overview

The Superstore Marketing Campaign Analysis is a comprehensive machine learning solution designed to evaluate customer behavior, analyze spending patterns, and predict responses to marketing campaigns. This project leverages data science and MLOps best practices to create a robust, scalable system for marketing optimization.

Objectives

Customer Segmentation: Identify distinct customer groups based on purchasing behaviors
Response Prediction: Build models to determine which customers are likely to respond to campaigns
Marketing Optimization: Provide data-driven strategies to enhance engagement and sales
MLOps Implementation: Establish a production-ready ML pipeline with monitoring capabilities

Dataset

The dataset consists of 2,240 customer records with information on:

Demographics (age, marital status, education)
Spending patterns across product categories
Campaign response history
Digital engagement metrics

Architecture

The project follows a modular architecture with several key components:

1. Data Engineering & Preprocessing

Implementation: src/preprocess.py
Configuration: config.yaml (preprocessing section)
Features:
- Missing value handling with configurable thresholds
- Feature engineering (spending ratios, age features, etc.)
- K-means clustering for customer segmentation
- Categorical variable recategorization
- Skewness detection and transformation

2. Model Training & Evaluation

Implementation: src/train_model.py
Configuration: config.yaml (models section)
Models:
- XGBoost Classifier
- LightGBM Classifier
- CatBoost Classifier
- Random Forest Classifier
- Stacking Ensemble
Evaluation Metrics:
- Accuracy, Precision, Recall, F1 Score, ROC-AUC

3. Hyperparameter Optimization

Implementation: src/hyperparameter_tuning.py
Framework: Optuna
Features:
- Bayesian optimization
- Model-specific parameter spaces
- F1 score optimization
- SQLite storage for persisting optimization runs

4. Experiment Tracking

Implementation: Integrated in training modules
Framework: MLflow
Features:
- Metric logging
- Parameter tracking
- Model versioning
- Artifact storage

5. Model Serving

Implementation: src/app.py
Framework: FastAPI
Endpoints:
- /predict for single predictions
- /predict-batch for multiple predictions
- /health for service health checks

6. Model Monitoring

Implementation: src/monitoring/
Features:
- Feature drift detection (Wasserstein distance)
- Target drift detection
- Prediction drift detection
- Concept drift detection (performance degradation)
- Scheduled monitoring with configurable thresholds

7. Containerization

Implementation: dockerfile and dockerfile.api
Features:
- Separate containers for training and serving
- Reproducible environments

Directory Structure

├── artifacts/              # Stored model artifacts and datasets
├── src/                    # Source code
│   ├── monitoring/         # Model monitoring components
│   ├── utils/              # Utility functions
│   ├── app.py              # FastAPI application
│   ├── preprocess.py       # Data preprocessing pipeline
│   ├── train_model.py      # Model training pipeline
│   └── hyperparameter_tuning.py  # Hyperparameter optimization
├── tests/                  # Test suite
├── config.yaml             # Main configuration
├── config_monitoring.yaml  # Monitoring configuration
├── dockerfile              # Docker configuration for training
├── dockerfile.api          # Docker configuration for API
├── pyproject.toml          # Poetry dependency management
├── poetry.lock             # Poetry lock file
├── .pre-commit-config.yaml # Pre-commit hooks configuration
└── README.md               # Project documentation

Setup Instructions

Prerequisites

Python 3.8+
Docker
Git

Local Development Setup

Clone the repository

git clone https://github.com/yourusername/superstore-marketing-campaign.git
cd superstore-marketing-campaign

Set up the environment with Poetry

# Install Poetry if you don't have it
pip install poetry

# Create virtual environment and install dependencies
poetry install

Alternative: Setup with pip

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

Set up pre-commit hooks
```
pre-commit install
```

Docker Setup

Build the training container

docker build -t superstore-training -f dockerfile .

Build the API container

docker build -t superstore-api -f dockerfile.api .

Usage Instructions

Data Preprocessing

# Run the preprocessing pipeline
poetry run python -m src.preprocess

Model Training

# Train the model with default configuration
poetry run python -m src.train_model

# Train a specific model
poetry run python -m src.train_model --model xgboost

Hyperparameter Tuning

# Run hyperparameter optimization
poetry run python -m src.hyperparameter_tuning --model xgboost --trials 50

Running the API

# Start the FastAPI server
poetry run uvicorn src.app:app --reload

# With Docker
docker run -p 8000:8000 superstore-api

Model Monitoring

# Run drift detection
poetry run python -m src.monitoring.drift_detector

# Set up scheduled monitoring
poetry run bash setup_monitoring.sh

Deployment Guide

Current Deployment

The project currently supports local deployment via Docker containers:

Training Pipeline Deployment

docker run --rm -v $(pwd)/artifacts:/app/artifacts superstore-training

API Deployment

docker run -d -p 8000:8000 --name superstore-api superstore-api

Testing the API

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"features": {...}}'

Future AWS Deployment

The project is designed for future deployment to AWS with the following components:

MLflow: Integration & Versioning:
- Remote Tracking Server on AWS (containerized on Fargate)
- Artifact Store in Amazon S3
- Backend Store in Amazon RDS
- Environment-Driven Configuration:
  - MLFLOW_TRACKING_URI, S3 bucket name, and DB credentials from environment variables
- Local MLflow tracking with experiment lifecycle management
Model Training:
- AWS Batch for scalable training jobs
- S3 for artifact storage
- ECR for container registry
Model Serving:
- AWS Lambda functions for serverless prediction
- API Gateway for REST API access
- Elastic Container Service (ECS) for containerized model serving
Monitoring:
- CloudWatch for logs and metrics
- Custom monitoring using the existing drift detection implemented in src/monitoring/
- EventBridge for scheduling monitoring tasks
CI/CD Pipeline:
- CodePipeline for automated deployments
- CodeBuild for container building
- GitHub Actions for testing and quality checks

Contribution Guidelines

Development Workflow

Create a feature branch

git checkout -b feature/your-feature-name

Make your changes and run tests

# Run tests
pytest tests/

# Run linter
pre-commit run --all-files

Commit your changes with meaningful messages

git commit -m "Add feature: description of the change"

Push your branch and create a pull request
```
git push origin feature/your-feature-name
```

Coding Standards

Follow PEP 8 naming conventions
Add docstrings to all functions and classes
Ensure code passes linting with pre-commit hooks
Write unit tests for new functionality

Making Code Changes

Configuration Changes:
- Update config.yaml for model parameters, preprocessing settings, etc.
- Update config_monitoring.yaml for monitoring thresholds
Adding New Models:
- Add model implementation in src/train_model.py
- Define hyperparameter search space in src/hyperparameter_tuning.py
- Update config.yaml with model configuration
Modifying Preprocessing:
- Update transformation logic in src/preprocess.py
- Add new feature engineering functions as needed
API Changes:
- Modify endpoint logic in src/app.py
- Update input/output schemas with Pydantic models
Monitoring Changes:
- Enhance drift detection in src/monitoring/drift_detector.py
- Add new monitoring metrics as needed

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
Initial experimentation (IS 1)		Initial experimentation (IS 1)
artifacts		artifacts
catboost_info		catboost_info
models		models
src		src
tests		tests
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
FastAPI in AWS.pdf		FastAPI in AWS.pdf
README.md		README.md
config.yaml		config.yaml
config_monitoring.yaml		config_monitoring.yaml
dockerfile		dockerfile
dockerfile.api		dockerfile.api
poetry.lock		poetry.lock
prueba.ipynb		prueba.ipynb
pyproject.toml		pyproject.toml
setup_monitoring.sh		setup_monitoring.sh
superstore_data.csv		superstore_data.csv

McGill-MMA-EnterpriseAnalytics/Superstore-Marketing-Campaign

Folders and files

Latest commit

History

Repository files navigation

Superstore Marketing Campaign Analysis

Project Overview

Objectives

Dataset

Architecture

1. Data Engineering & Preprocessing

2. Model Training & Evaluation

3. Hyperparameter Optimization

4. Experiment Tracking

5. Model Serving

6. Model Monitoring

7. Containerization

Directory Structure

Setup Instructions

Prerequisites

Local Development Setup

Docker Setup

Usage Instructions

Data Preprocessing

Model Training

Hyperparameter Tuning

Running the API

Model Monitoring

Deployment Guide

Current Deployment

Future AWS Deployment

Contribution Guidelines

Development Workflow

Coding Standards

Making Code Changes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages