The Superstore Marketing Campaign Analysis is a comprehensive machine learning solution designed to evaluate customer behavior, analyze spending patterns, and predict responses to marketing campaigns. This project leverages data science and MLOps best practices to create a robust, scalable system for marketing optimization.
- Customer Segmentation: Identify distinct customer groups based on purchasing behaviors
- Response Prediction: Build models to determine which customers are likely to respond to campaigns
- Marketing Optimization: Provide data-driven strategies to enhance engagement and sales
- MLOps Implementation: Establish a production-ready ML pipeline with monitoring capabilities
The dataset consists of 2,240 customer records with information on:
- Demographics (age, marital status, education)
- Spending patterns across product categories
- Campaign response history
- Digital engagement metrics
The project follows a modular architecture with several key components:
- Implementation:
src/preprocess.py
- Configuration:
config.yaml
(preprocessing section) - Features:
- Missing value handling with configurable thresholds
- Feature engineering (spending ratios, age features, etc.)
- K-means clustering for customer segmentation
- Categorical variable recategorization
- Skewness detection and transformation
- Implementation:
src/train_model.py
- Configuration:
config.yaml
(models section) - Models:
- XGBoost Classifier
- LightGBM Classifier
- CatBoost Classifier
- Random Forest Classifier
- Stacking Ensemble
- Evaluation Metrics:
- Accuracy, Precision, Recall, F1 Score, ROC-AUC
- Implementation:
src/hyperparameter_tuning.py
- Framework: Optuna
- Features:
- Bayesian optimization
- Model-specific parameter spaces
- F1 score optimization
- SQLite storage for persisting optimization runs
- Implementation: Integrated in training modules
- Framework: MLflow
- Features:
- Metric logging
- Parameter tracking
- Model versioning
- Artifact storage
- Implementation:
src/app.py
- Framework: FastAPI
- Endpoints:
/predict
for single predictions/predict-batch
for multiple predictions/health
for service health checks
- Implementation:
src/monitoring/
- Features:
- Feature drift detection (Wasserstein distance)
- Target drift detection
- Prediction drift detection
- Concept drift detection (performance degradation)
- Scheduled monitoring with configurable thresholds
- Implementation:
dockerfile
anddockerfile.api
- Features:
- Separate containers for training and serving
- Reproducible environments
├── artifacts/ # Stored model artifacts and datasets
├── src/ # Source code
│ ├── monitoring/ # Model monitoring components
│ ├── utils/ # Utility functions
│ ├── app.py # FastAPI application
│ ├── preprocess.py # Data preprocessing pipeline
│ ├── train_model.py # Model training pipeline
│ └── hyperparameter_tuning.py # Hyperparameter optimization
├── tests/ # Test suite
├── config.yaml # Main configuration
├── config_monitoring.yaml # Monitoring configuration
├── dockerfile # Docker configuration for training
├── dockerfile.api # Docker configuration for API
├── pyproject.toml # Poetry dependency management
├── poetry.lock # Poetry lock file
├── .pre-commit-config.yaml # Pre-commit hooks configuration
└── README.md # Project documentation
- Python 3.8+
- Docker
- Git
-
Clone the repository
git clone https://github.com/yourusername/superstore-marketing-campaign.git cd superstore-marketing-campaign
-
Set up the environment with Poetry
# Install Poetry if you don't have it pip install poetry # Create virtual environment and install dependencies poetry install
-
Alternative: Setup with pip
# Create a virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -e .
-
Set up pre-commit hooks
pre-commit install
-
Build the training container
docker build -t superstore-training -f dockerfile .
-
Build the API container
docker build -t superstore-api -f dockerfile.api .
# Run the preprocessing pipeline
poetry run python -m src.preprocess
# Train the model with default configuration
poetry run python -m src.train_model
# Train a specific model
poetry run python -m src.train_model --model xgboost
# Run hyperparameter optimization
poetry run python -m src.hyperparameter_tuning --model xgboost --trials 50
# Start the FastAPI server
poetry run uvicorn src.app:app --reload
# With Docker
docker run -p 8000:8000 superstore-api
# Run drift detection
poetry run python -m src.monitoring.drift_detector
# Set up scheduled monitoring
poetry run bash setup_monitoring.sh
The project currently supports local deployment via Docker containers:
-
Training Pipeline Deployment
docker run --rm -v $(pwd)/artifacts:/app/artifacts superstore-training
-
API Deployment
docker run -d -p 8000:8000 --name superstore-api superstore-api
-
Testing the API
curl -X POST "http://localhost:8000/predict" \ -H "Content-Type: application/json" \ -d '{"features": {...}}'
The project is designed for future deployment to AWS with the following components:
-
MLflow: Integration & Versioning:
- Remote Tracking Server on AWS (containerized on Fargate)
- Artifact Store in Amazon S3
- Backend Store in Amazon RDS
- Environment-Driven Configuration:
- MLFLOW_TRACKING_URI, S3 bucket name, and DB credentials from environment variables
- Local MLflow tracking with experiment lifecycle management
-
Model Training:
- AWS Batch for scalable training jobs
- S3 for artifact storage
- ECR for container registry
-
Model Serving:
- AWS Lambda functions for serverless prediction
- API Gateway for REST API access
- Elastic Container Service (ECS) for containerized model serving
-
Monitoring:
- CloudWatch for logs and metrics
- Custom monitoring using the existing drift detection implemented in
src/monitoring/
- EventBridge for scheduling monitoring tasks
-
CI/CD Pipeline:
- CodePipeline for automated deployments
- CodeBuild for container building
- GitHub Actions for testing and quality checks
-
Create a feature branch
git checkout -b feature/your-feature-name
-
Make your changes and run tests
# Run tests pytest tests/ # Run linter pre-commit run --all-files
-
Commit your changes with meaningful messages
git commit -m "Add feature: description of the change"
-
Push your branch and create a pull request
git push origin feature/your-feature-name
- Follow PEP 8 naming conventions
- Add docstrings to all functions and classes
- Ensure code passes linting with pre-commit hooks
- Write unit tests for new functionality
-
Configuration Changes:
- Update
config.yaml
for model parameters, preprocessing settings, etc. - Update
config_monitoring.yaml
for monitoring thresholds
- Update
-
Adding New Models:
- Add model implementation in
src/train_model.py
- Define hyperparameter search space in
src/hyperparameter_tuning.py
- Update
config.yaml
with model configuration
- Add model implementation in
-
Modifying Preprocessing:
- Update transformation logic in
src/preprocess.py
- Add new feature engineering functions as needed
- Update transformation logic in
-
API Changes:
- Modify endpoint logic in
src/app.py
- Update input/output schemas with Pydantic models
- Modify endpoint logic in
-
Monitoring Changes:
- Enhance drift detection in
src/monitoring/drift_detector.py
- Add new monitoring metrics as needed
- Enhance drift detection in