IncomeDataVisualization

CENG 574 Data Visualization Project

This project aims to analyze the U.S. Census Bureau Income Dataset. The main goal was to project the data onto 2 dimensions to better visualize the multivariable dataset and to identify clusters using unsupervised machine learning algorithms. It was submitted as part of the Final Project for Fall 2020 METU CENG 574: Statistical Data Analysis course.

The codes used to generate the results for the Final Paper can be found in the Final_Paper_Script. The original R Markdown file's generated pdf, Final_Paper_Extra_Plots, contains all of the extra plots that were mentioned, but not shown, in the Final Paper. Additionally, these plots can be seen individually directly inside the Plots directory.

Projection Methods Used:

Principal Component Analysis (PCA)
Multiple Multidimensional Scaling (MDS) (Classic Torgerson’s, Sammon's, Kruskal's nonlinear mapping, Symmetric Smacof)
Uniform Manifold Approximation and Projection (UMAP)
t-distributed Stochastic Neighbor Embedding (t-SNE)

Clustering Algorithms Used:

Agglomerative Nesting (AGNES) hierarchical clustering (with 6 different linkages)
Divisive Analysis (DIANA) clustering
k-means Clustering
k-medoids Clustering
k-means clustering applied on a Self-organizing Map (SOM)

Cluster Validation Tests Used:

Stability: Nonparametric Bootstrap, Avg. Proportion of Non-overlap, Avg. Distance (AD), AD between Means, and Figure of Merit
Internal Validation: Connectivity, Silhouette Width, and Dunn Index
External Validation: Rand Index

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Data		Data
Old		Old
Plots		Plots
RMarkdown		RMarkdown
.gitignore		.gitignore
CENG_574_Final_Paper.pdf		CENG_574_Final_Paper.pdf
Final_Paper_Extra_Plots.pdf		Final_Paper_Extra_Plots.pdf
Final_Paper_Script.R		Final_Paper_Script.R
README.md		README.md
my_functions.R		my_functions.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IncomeDataVisualization

CENG 574 Data Visualization Project

Projection Methods Used:

Clustering Algorithms Used:

Cluster Validation Tests Used:

About

Releases

Packages

Contributors 2

Languages

RKHashmani/IncomeDataVisualization

Folders and files

Latest commit

History

Repository files navigation

IncomeDataVisualization

CENG 574 Data Visualization Project

Projection Methods Used:

Clustering Algorithms Used:

Cluster Validation Tests Used:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages