-
Notifications
You must be signed in to change notification settings - Fork 0
/
fine_tuning_NasNet.py
177 lines (127 loc) · 8.7 KB
/
fine_tuning_NasNet.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
import os
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
train_path=r'c:\Image Recognition tenserflow\train2'
model_name = 'train2_NASNetLarge_continue2.h5'
save_model_name = 'train2_NASNetLarge_fine.h5'
image_size = 331 # All images will be resized to 331x331
batch_size = 16
# Rescale all images by 1./255 and apply image augmentation
train_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
validation_split=0.2,
rotation_range=5,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
shear_range=0.2,
zoom_range=0.2)
# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
train_path, # Source directory for the training images
target_size=(image_size, image_size),
batch_size=batch_size,
class_mode='categorical',
subset='training')
# Flow validation images in batches of 20 using test_datagen generator
validation_generator = train_datagen.flow_from_directory(
train_path, # Source directory for the validation images
target_size=(image_size, image_size),
batch_size=batch_size,
class_mode='categorical',
subset='validation')
model = tf.keras.models.load_model(model_name)
"""## Create the base model from the pre-trained convnets
We will create the base model from the **MobileNet V2** model developed at Google, and pre-trained on the ImageNet dataset, a large dataset of 1.4M images and 1000 classes of web images. This is a powerful model. Let's see what the features that it has learned can do for our cat vs. dog problem.
First, we need to pick which intermediate layer of MobileNet V2 we will use for feature extraction. A common practice is to use the output of the very last layer before the flatten operation, the so-called "bottleneck layer". The reasoning here is that the following fully-connected layers will be too specialized to the task the network was trained on, and thus the features learned by these layers won't be very useful for a new task. The bottleneck features, however, retain much generality.
Let's instantiate an MobileNet V2 model pre-loaded with weights trained on ImageNet. By specifying the **include_top=False** argument, we load a network that doesn't include the classification layers at the top, which is ideal for feature extraction.
"""
#IMG_SHAPE = (image_size, image_size, 3)
# Create the base model from the pre-trained model MobileNet V2
#base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
# include_top=False,
# weights='imagenet')
"""## Fine tuning
In our feature extraction experiment, we were only training a few layers on top of an MobileNet V2 base model. The weights of the pre-trained network were **not** updated during training. One way to increase performance even further is to "fine-tune" the weights of the top layers of the pre-trained model alongside the training of the top-level classifier. The training process will force the weights to be tuned from generic features maps to features associated specifically to our dataset.
Note: this should only be attempted after you have trained the top-level classifier with the pre-trained model set to non-trainable. If you add a randomly initialized classifier on top of a pre-trained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier) and your pre-trained model will just forget everything it has learned.
Additionally, the reasoning behind fine-tuning the top layers of the pre-trained model rather than all layers of the pre-trained model is the following: in a convnet, the higher up a layer is, the more specialized it is. The first few layers in a convnet learned very simple and generic features, which generalize to almost all types of images. But as you go higher up, the features are increasingly more specific to the dataset that the model was trained on. The goal of fine-tuning is to adapt these specialized features to work with the new dataset.
### Un-freeze the top layers of the model
All we need to do is unfreeze the `base_model`, and set the bottom layers be un-trainable. Then, recompile the model (necessary for these changes to take effect), and resume training.
"""
model.trainable = True
set_trainable = False
for layer in model.layers:
if layer.name == 'activation_253':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False
print("layer {} is {}".format(layer.name, '+++trainable' if layer.trainable else '---frozen'))
#model.layers[0].trainable = True
## Let's take a look to see how many layers are in the base model
##print("Number of layers in the base model: ", len(base_model.layers))
## Fine tune from this layer onwards
#fine_tune_at = 100
## Freeze all the layers before the `fine_tune_at` layer
#for layer in model.layers[0].layers[:fine_tune_at]:
# layer.trainable = False
"""### Compile the model
Compile the model using a much-lower training rate.
"""
model.compile(optimizer = tf.keras.optimizers.RMSprop(lr=2e-5),
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
len(model.trainable_variables)
"""### Continue Train the model
If you trained to convergence earlier, this will get you a few percent more accuracy.
"""
epochs = 10
steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size
history_fine = model.fit_generator(train_generator,
steps_per_epoch = steps_per_epoch,
epochs=epochs,
workers=4,
validation_data=validation_generator,
validation_steps=validation_steps)
# save model and architecture to single file
model.save(save_model_name)
print("Saved model to disk")
"""### Learning curves
Let's take a look at the learning curves of the training and validation accuracy / loss, when fine tuning the last few layers of the MobileNet V2 base model, as well as the classifier on top of it. Note the validation loss much higher than the training loss which means there maybe some overfitting.
**Note**: the training dataset is fairly small, and is similar to the original datasets that MobileNet V2 was trained on, so fine-tuning may result in overfitting.
If you train to convergence (`epochs=50`) the resulting graph should look like this:
![After fine tuning the model nearly reaches 98% accuracy](./images/fine_tuning.png)
"""
acc = history_fine.history['acc']
val_acc = history_fine.history['val_acc']
loss = history_fine.history['loss']
val_loss = history_fine.history['val_loss']
plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.ylim([0.9, 1])
plt.plot([epochs-1,epochs-1], plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.ylim([0, 0.2])
plt.plot([epochs-1,epochs-1], plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
"""# Key takeaways
In summary here is what we covered in this tutorial on how to do transfer learning using a pre-trained model to improve accuracy:
* Using a pre-trained model for **feature extraction** - when working with a small dataset, it is common to leverage the features learned by a model trained on a larger dataset in the same domain. This is done by instantiating the pre-trained model and adding a fully connected classifier on top. The pre-trained model is "frozen" and only the weights of the classifier are updated during training.
In this case, the convolutional base extracts all the features associated with each image and we train a classifier that determines, given these set of features to which class it belongs.
* **Fine-tuning** a pre-trained model - to further improve performance, one might want to repurpose the top-level layers of the pre-trained models to the new dataset via fine-tuning.
In this case, we tune our weights such that we learn highly specified and high level features specific to our dataset. This only make sense when the training dataset is large and very similar to the original dataset that the pre-trained model was trained on.
"""