- Project definition
- Analysis
- Detecting a human face
- Detecting a dog
- Determining the most resembling dog breed
- Conclusion
- Libraries used
- How to run the application locally
This is a work for my udacity data science nanodegree.
It is a web app, that allows you to upload an image of a dog or a human and tells you which dog breed is most resembling one.
That is realized by using a pretrained convolutional network with transfer learning.
Additionally for every image will be detected if it contains a dog or a human face.
To test the quality of the face detection method, we chose 100 random images from the of human face images and 100 random images from the set of dog images.
From the subset with 100 human faces classifier detected in every image a human face. Thats what we expected.
From the subset with 100 dog images the haar cascade classifier detected in 11 images a human face. Here we would have wished, that none was detected, but a rate of 11% is sufficient for our purposes.
To test the quality of the dog detection method, we used the same procedure as for the human face detection.
From the subset of with 100 human faces, in no image a dog was detected.
From the subset with 100 dogs, in every image a dog was detected.
Thats perfect.
We used accuracy for performance measurement of of the dog breed prediction model. It fits multiclass classification and its interpretation is quite intuitive: percentage of total images classified correctly.
The training, validation and testing data was provided by udacity.
It contained 8351 images of dogs categorized into 133 dog breeds.
These images were splitted into:
- 6680 images for training
- 835 images for validation
- 836 images for testing
The distribution of images to dog breed categories was roughly similar in all three sets:
The x axis simply lists the indices of the 133 dog breed categories, ranging from 0 to 132. On the y axis you can see, how many percents of the image set belong to a dog breed category with a given index. As you can see the curves for all three sets (training, validation, tests) run roughly parallel. So there are no unbalanced distributions of categories between the three images sets.
The models used in this app were developed in a udacity workspace environment with a gpu.
The Keras CNN with Tensorflow backend expects a tensor with the shape:
(n_images, n_rows, n_columns, n_channels)
Where:
- n_images is the number of images
- n_rows is the number of pixel rows in each image
- n_columns is the number of pixel columns in each image
- n_channels is the number of channels in each image
So each image has to be:
- loaded from disk
- scaled to a size of 224x224 pixels
- be converted to a tensor of the required shape
This was done by the following code:
from keras.preprocessing import image
import numpy as np
def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
For detecting human faces open CV's implementation of Haar feature based cascade classifiers was used.
For detecting dogs we used the ResNet50 model with weights trained on the imagenet data set.
For the prediction of the dog breed we tried several ways.
The first try was to build a CNN from scratch.
The model architecture was as following:
model = Sequential()
model.add(Conv2D(16, (2,2), activation='relu', input_shape=(224, 224, 3)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(32, (2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(GlobalAveragePooling2D())
model.add(Dense(133, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
Training this model for 10 epochs with a batch size of 20 resulted in a test accuracy of 4.1%. Thats pretty bad and not much better than he dumbest method possible: choosing one of the 133 dog breed categories randomly.
The second try was to compare 5 pretrained models with transfer learning. On top of the pretrained model, we added a global average pooling layer and a two dense layers with 133 units in each one.
model = Sequential()
model.add(GlobalAveragePooling2D(input_shape=self.train.shape[1:]))
model.add(Dense(133, activation='relu'))
model.add(Dense(133, activation='softmax'))
The results were:
- VGG16: test accuracy of 69.02%
- VGG19: test accuracy of 69.02% (really!)
- ResNet50: test accuracy of 75.84%
- InceptionV3: test accuracy of 77.75%
- Xception: test accuracy of 80.26%
Below we compare the history of training and validation accuracy over 20 epochs for all five models.
On the left there is a run without a dropout layer. On the right there is a run with a 50% dropout layer added.
For all models, overfitting starts early on with the second epoch.
Introducing the dropout layer reduces overfitting, but the validation accuracy does not increase.
Using pretrained CNNs resulted in much better test accuracy than training a CNN from scratch.
VGG16 and VGG19 were much better than a CNN from scratch.
InceptionV3 and Xception topped them with the best test accuracies.
Xception got the best test accuracy: thats why we used it for the implementation.
For all models evaluated we did not invest much effort in tuning the model. Adding dropout to reduce overfitting did not work out.
To get further improvements we could: * do data augmentation on the image sets * do grid search of model parameters
With pretrained CNNs its quite easy -- even for a newbie like me -- to get acceptable performance for a toy task like classifying dog breeds.
I was surprised, that adding dropout did not help out to improve performance. The overfitting was reduced, but no performance gain was made.
When in a hurry to get quick results I would start next time with Xception immediately and invest more time in tuning.
Data augmentation is the next step I will take to increase performance.
The following libraries were used:
- flask for the implementation of the web app
- keras and tensorflow for the usage of the prediction models
- cv2 for the usage of the haar cascade classifier
- and: numpy, json, werkzeug
cd dogbreedapp
mkdir uploads
python dogbreedapp.py
Then open in browser: http://0.0.0.0:3001