📌 Technologies Used: Python, Pandas, NumPy, Statsmodels, Seaborn, Matplotlib
This project investigates the impact of different marketing strategies on sales performance. Using Market Mix Modeling (MMM), the analysis distinguishes causation from correlation in marketing investments, helping businesses optimize resource allocation.
The study leverages regression analysis, correlation analysis, and exploratory data analysis (EDA) to derive insights from historical campaign data.
- Introduction
- Data Preprocessing
- Feature Engineering
- Exploratory Data Analysis (EDA) & Statistical Analysis
- Correlation & Causal Inference - Regression Analysis
- Recommendations & Strategies
- How to Run
Marketing campaigns influence customer acquisition, but which strategies truly impact revenue? This project aims to determine:
- Which marketing channels drive the most revenue?
- Are campaigns causally linked to sales growth or just correlated?
- How do different customer types respond to campaigns?
This project uses OLS Regression to quantify marketing effectiveness.
The dataset contains campaign-related variables such as:
Campaign_Email
,Campaign_Flyer
,Campaign_Phone
Sales_Contact_1
,Sales_Contact_2
, ...,Sales_Contact_5
Amount_Collected
(Target variable)Client_Type
(Large, Medium, Small, Private Facility)
✔ Handling missing values
✔ Standardizing column names
✔ Creating time-based features (Month, Year)
- Time-based aggregation:
Calendar_Month
,Calendar_Year
- Customer segmentation analysis: Grouping by
Client_Type
- Sales impact analysis: Summarizing key revenue drivers
- Distribution of revenue (
Amount_Collected
) across client types - Correlation of campaigns with sales revenue
- Visualization of marketing effectiveness
- Sales distribution per
Client_Type
- Time-series trend of
Amount_Collected
- Correlation heatmap of campaign effectiveness
We performed OLS Regression to measure the impact of marketing campaigns on revenue.
R-squared = 0.480 # 48% variance explained
F-statistic = 342.1 (p-value = 0.00) # Model is statistically significant
Durbin-Watson = 0.624 # Suggests autocorrelation
Feature | Coefficient | p-value | Impact on Revenue |
---|---|---|---|
Campaign_Flyer | 3.34 | 0.000 | 🚀 Positive |
Sales_Contact_1 | 4.24 | 0.000 | 🚀 Positive |
Sales_Contact_2 | 3.64 | 0.000 | 🚀 Positive |
Sales_Contact_3 | 2.34 | 0.000 | 🚀 Positive |
Sales_Contact_4 | 10.95 | 0.000 | 🚀 Strongest Impact |
❌ Not significant predictors: Campaign_Email
, Campaign_Phone
, Sales_Contact_5
📌 Actionable Insights:
- Increase investment in flyer campaigns & sales contacts (significant positive effect).
- Campaign phone calls are ineffective—reallocate budget.
- Sales Contact 4 has the highest impact—focus on optimizing this interaction.
- Different client types respond differently—customized strategies needed.
Ensure you have the required Python libraries installed:
pip install pandas numpy seaborn statsmodels matplotlib
Launch Jupyter Notebook or Google Colab and execute:
!pip install pytimetk -q
import pandas as pd, numpy as np, seaborn as sns, statsmodels.api as sm
import statsmodels.formula.api as smf
- Run all cells to process the data and generate insights.
- Analyze the regression summary and visualizations.
This project provides a data-driven strategy to optimize marketing investments by identifying causal relationships, not just correlations. Future improvements include:
- Implementing A/B testing for campaign effectiveness.
- Exploring machine learning models (e.g., Random Forest, XGBoost) for better predictions.
📢 Feel free to contribute, raise issues, or suggest improvements! 🚀