This code conducts K-Means clustering on a customer dataset retrieved from a CSV file. The initial steps involve loading the data into a Pandas DataFrame, selecting relevant columns representing customer features, and converting them into a NumPy array called X. Subsequently, it employs a loop to execute K-Means clustering with varying numbers of clusters, ranging from 2 to 10, while simultaneously calculating the Within-Cluster Sum of Squares (WCSS) for each cluster count. The aim is to determine the optimal number of clusters, which is visualized by plotting the WCSS values against the number of clusters. By identifying the "elbow point" in the plot, this code establishes the most suitable number of clusters, typically based on a trade-off between cluster tightness and not over-segmenting. Finally, K-Means clustering is performed once more with the chosen optimal number of clusters (in this case, 5), and the resulting cluster assignments are stored in y_kmeans. The code concludes by generating a scatter plot that visually represents these clusters, each denoted by a distinct color, offering insights into how customers group based on their characteristics. This clustering analysis is invaluable for customer segmentation and tailored marketing strategies.
-
Notifications
You must be signed in to change notification settings - Fork 0
Rkarande1/K-mean-Clustering
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published