Transfer Learning with Keras25 Dec 2018 by dzlab
Transfer Learning is a very important concept in ML generally and DL specifically. It aims to reuse the knowledge gathered by an already trained model on a specific task and trasfer this knowledge to a new task. By doing this, the new model can be trained in less time and may also require less data compared to training a regular model from scratch.
The following article shows how easy it is to achieve “transfer learning” in the image classification task with Keras. Starting from a classifier trained on the ImageNet Dataset, we will re-adapt the classifier architecture to the problem of recognizing World Chess champions and traing the new model with few images. With such an approach we can train our model very fast (in a matter of seconds) with very few images (sometines a dozen can be enough) yet we will get a good accuracy. In fact, even if that there are no Chess champions images in ImageNet, it turns out that ImageNet is already good enough at recognizing things in the world.
Download the pictures using the URLs you got from last step and store them in an imagenet compatible folder structure (with train, validation and test subsets), i.e.
root |_ dataset |_ train |_ label1 |_ label2 |_ ... |_ test |_ label1 |_ label2 |_ ...
We will take a ResNet-50 pre-trained model, and then we train it to predict our labels (i.e. World Chess champions). In keras, it’s simply:
For the moment we cannot use this model for our task, in fact if you look at the summary of this model with
model1.summary(), it has a last layer with 1000 outputs. This is because the model was trained to recognize the categories available in ImageNet (i.e. 1000).
We need to readapt the model to our task by doing the following:
- Remove the last layer of the original model.
- Add a header on top of this base model with an output size same as the number of categories,
- Freeze the layers in this base model, i.e.
layer.trainable = False
- Train only the head using the previous downloaded pictures of champions.
In Keras, the previous steps translates into:
Then freezing the earlier layers from the original model, and training only the newly added layers as follows:
After traning the model, we can use Confusion matrix to analyze what classes where predicted well and which one where confusion for the trained model. E.g. in the following matrix
Kramnik is well recognized by the model but it fails to properly distinguish
fischer/karpov/kasparov. When looking at the dataset, many
fischer images contain
karpov as they played againts each other in the Match of the Century. Similarly for
To visualy explain what the trained model look at in an input picture, we can use the Grad-CAM as follows:
As a second experiment with Transfer Learning for image classification, applying the same approach on the Oxford-IIIT Pet Dataset which has 37 categories of dogs and cats, with 200 images for each class.
After only five epochs, we already get pretty good result with our classifier
Epoch 5/5 94/94 [==============================] - 64s 685ms/step - loss: 0.1128 - acc: 0.9835 - val_loss: 0.8669 - val_acc: 0.7520
And looking at the heat map of the activation we can see that the classifier did a pretty good job at locating the important section in the image.
One we can also do to have better idea of the dataset is calculating the Cosine Similarity between few of the images. I took two images from each category and calculated the similarity in TensorFlow as follows:
Displaying the resulting matrix using Seaborn Heatmap gives the following picture:
We can take this approach further by automating the process of re-adapting the NN architecture so that a user have to only pass the dataset and the system will infer the architecture.
|World Chess Champions||Run notebook in Google Colab||view notebook on Github|
|Oxford-IIIT Pet Dataset||Run notebook in Google Colab||view notebook on Github|