Rossmann operates over 3,000 drug stores in 7 European countries. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. With thousands of individual managers predicting sales based on their unique circumstances, the accuracy of results can be quite varied.
To know more about the dataset, check https://www.kaggle.com/c/rossmann-store-sales
As the dataset lacked in certain cases, like providing information about the location and weather, information about the location was inferred based on holidays, and knowing the location, a weather dataset was merged accordingly.
Link to the datasets used: https://drive.google.com/drive/folders/1XC2Q6fZ58DclicGXP1ajgZD_nW0C9Dyp?usp=sharing
This project is split into 3 phases:
Phase 01 of the project dealt with Data Cleaning, EDA and feature Engineering.
Phase 02 of the project dealt with using various ML models (Multi Linear Regression, Lasso Regression, Gradient Boosted Trees, RNN) to predict Sales of the Rossmann Stores. Among all the models used the gradient boosted trees models (LGBM model) shows most promise, with score of 98%
Phase 03 dealt with using all the data inferred, from the previous phases, to create a simple business dashboard using tableau.
- Python 3.x
- Jupyter
- Required ML libraries & visualisation libraries (scikit-learn, keras, tenserflow, numpy, pandas, seaborn, matplotlib)
- Tableau Desktop
- Download the ipython files present under code folder of the repo
- Make sure to change the paths used for reading the datasets accordingly
- Run all the cells of the jupyter notebook
Note that the first ipython file creates 3 .csv files namely, location.csv, cleaned_weather.csv & final_RossmannSales.csv. final_RossmannSales.csv is used as input for the second ipython file, alternatively, you can download this file from the provided drive link too.
- Download the tableau playbook present under the tableau folder of the repo & final_RossmannSales.csv from the provided drive link.
- Establish a live data source connection & run
Alternatively, you can check it out on Tableau Online