This project analyzes Global Superstore retail data using SQL (PostgreSQL) and Python.
🚀 Tools: PostgreSQL · SQL · Python · Pandas · Seaborn · VS Code
data/
→ Contains the raw CSV filescripts/
→ Python & SQL filesload_to_db.py
→ Loads data into PostgreSQLdata_cleaning.sql
→ Cleans the datasetbusiness_insights.sql
→ Insight-generating SQLvisualization1.py
→ Generates charts
visuals/
→ Output images and chartsnotebooks/
→ Optional Jupyter notebookscleaning_log.md
→ Cleaning steps documentationrequirements.txt
→ Python dependenciesREADME.md
→ Project documentation
- Clean raw sales data using SQL
- Extract actionable insights through SQL queries
- Visualize insights using Excel
- Summarize findings with business relevance
- Verified nulls, standardized casing in columns
- Checked for duplicate orders and invalid dates
- Cleaned discount and profit anomalies
- Logged all cleaning steps in
cleaning_log.md
Do repeat customers bring higher profit?
Where are we giving too much discount but getting low returns?
How does shipping mode affect order performance?
Business Area | Insight Example |
---|---|
Regional Performance | Central & West regions are leading in profit |
Customer Behavior | Repeat customers guves higher avg. profit |
Product Strategy | Some high-selling products have less profitability |
Discount Optimization | Certain cities have high discounts but low returns |
Logistics & Delivery | Standard shipping is most used, but not always most profitable |
-
Clone the repo
git clone https://github.com/darshanr-c/sql_superstore_analysis.git
-
Activate the virtual environment
source .venv/bin/activate
-
Install Python dependencies
pip install -r requirements.txt
-
Load data into PostgreSQL
python scripts/load_to_db.py
-
Run cleaning & analysis
- SQL:
scripts/data_cleaning.sql
&business_insights.sql
- Python:
scripts/visualization1.py
- SQL:
- PostgreSQL 14+ installed locally
- SQL queries run using:
- VS Code + PostgreSQL extension (recommended)
- Or DBeaver / psql CLI
- Python 3.9+ with the following libraries:
- pandas, sqlalchemy, psycopg2-binary, seaborn, matplotlib
Darshan Chaudhari
Master’s in Data Science · Germany
LinkedIn →