Programming Languages/Software: R, RStudio
Skills Used:
Neural Networks
SVM
kNN
Cluster Analysis
Exploratory Analysis
Handwriting is unique to every individual. The ability to systematically recognize letters that are handwritten has made great strides; however, there still remains a level of complexity due to the individuality of how each one of us writes a letter or word. This project uses machine learning and cluster analysis to recognize handwritten alphanumeric characters.
The data contains 11,000 rows and 3,136 predictor variables (Pixel.1 to Pixel.3136), all of which were binary variables containing only 0 or 1. When consumed as a unit, these predictor variables form a 56 x 56 image of the represented digit. There are no missing values in this dataset.
Sample of plotted letters and numbers are provided below.
Looking at the distribution of each character, there is little variability on average in how people draw their characters. 1's and K's seem to have the widest distributions while 9's and J's have the smallest. Furthermore, there are few outliers in this dataset and some characters have no outliers at all.
Model | Test Error Rate (%) |
---|---|
kNN | 31.3 |
NNET | 33.9 |
SVM | 20.0 |