Retail Data Clustering Project

Description

This project focuses on clustering retail data to segment customers based on their purchasing behavior. The analysis is built around RFM analysis (Recency, Frequency, Monetary value) combined with K-Means clustering. The code is adapted from the tutorial Clustering Retail Data and extended with a reusable Python package and command line interface.

Usage

Install dependencies (preferably in a virtual environment):
```
pip install pandas scikit-learn matplotlib seaborn
```
Run the pipeline on your retail Excel file:
```
python -m retail_clustering.pipeline data/online_retail_II.xlsx --out results.csv
```
The script will automatically determine the optimal number of clusters using the silhouette score.

Development

Unit tests are provided to validate the RFM calculations and clustering logic:

pytest -q

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
retail_clustering		retail_clustering
tests		tests
.gitignore		.gitignore
README.md		README.md
retail.ipynb		retail.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Data Clustering Project

Description

Usage

Development

About

Uh oh!

Releases

Packages

Languages

nikitaaveritchev/retail_clustering

Folders and files

Latest commit

History

Repository files navigation

Retail Data Clustering Project

Description

Usage

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages