Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at "Save tf lite models for best and last checkpoint" #39

Closed
Spilleren opened this issue Sep 18, 2020 · 22 comments
Closed

Error at "Save tf lite models for best and last checkpoint" #39

Spilleren opened this issue Sep 18, 2020 · 22 comments

Comments

@Spilleren
Copy link

I cant seem to get further than this command, and I have had no luck fixing it myself.

best_index = np.argmax(np.array(history.history['val_angle_metric']) \ + np.array(history.history['val_direction_metric'])) best_checkpoint = str("cp-%04d.ckpt" % (best_index+1)) best_model = utils.load_model(os.path.join(checkpoint_path,best_checkpoint),loss_fn,metric_list) best_tflite = utils.generate_tflite(checkpoint_path, best_checkpoint) utils.save_tflite (best_tflite, checkpoint_path, "best") print("Best Checkpoint (val_angle: %s, val_direction: %s): %s" \ %(history.history['val_angle_metric'][best_index],\ history.history['val_direction_metric'][best_index],\ best_checkpoint))

It returns the error
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
2 + np.array(history.history['val_direction_metric']))
3 best_checkpoint = str("cp-%04d.ckpt" % (best_index+1))
----> 4 best_model = utils.load_model(os.path.join(checkpoint_path,best_checkpoint),loss_fn,metric_list)
5 best_tflite = utils.generate_tflite(checkpoint_path, best_checkpoint)
6 utils.save_tflite (best_tflite, checkpoint_path, "best")

~/git/OpenBot/policy/utils.py in load_model(model_path, loss_fn, metric_list)
72
73 def load_model(model_path,loss_fn,metric_list):
---> 74 model = tf.keras.models.load_model(model_path,
75 custom_objects=None,
76 compile=False

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
188 if isinstance(filepath, six.string_types):
189 loader_impl.parse_saved_model(filepath)
--> 190 return saved_model_load.load(filepath, compile)
191
192 raise IOError(

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in load(path, compile)
114 # TODO(kathywu): Add saving/loading of optimizer, compiled losses and metrics.
115 # TODO(kathywu): Add code to load from objects that contain all endpoints
--> 116 model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
117
118 # pylint: disable=protected-access

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, loader_cls)
600 object_graph_proto = meta_graph_def.object_graph_def
601 with ops.init_scope():
--> 602 loader = loader_cls(object_graph_proto,
603 saved_model_proto,
604 export_dir)

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in init(self, *args, **kwargs)
186 self._models_to_reconstruct = []
187
--> 188 super(KerasObjectLoader, self).init(*args, **kwargs)
189
190 # Now that the node object has been fully loaded, and the checkpoint has

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in init(self, object_graph_proto, saved_model_proto, export_dir)
121 self._concrete_functions[name] = _WrapperFunction(concrete_function)
122
--> 123 self._load_all()
124 self._restore_checkpoint()
125

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_all(self)
207 # loaded from config may create variables / other objects during
208 # initialization. These are recorded in _nodes_recreated_from_config.
--> 209 self._layer_nodes = self._load_layers()
210
211 # Load all other nodes and functions.

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_layers(self)
310
311 for node_id, proto in metric_list:
--> 312 layers[node_id] = self._load_layer(proto.user_object, node_id)
313 return layers
314

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_layer(self, proto, node_id)
335 obj, setter = self._revive_from_config(proto.identifier, metadata, node_id)
336 if obj is None:
--> 337 obj, setter = revive_custom_object(proto.identifier, metadata)
338
339 # Add an attribute that stores the extra functions/objects saved in the

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in revive_custom_object(identifier, metadata)
776 return revived_cls._init_from_metadata(metadata) # pylint: disable=protected-access
777 else:
--> 778 raise ValueError('Unable to restore custom object of type {} currently. '
779 'Please make sure that the layer implements get_config'
780 'and from_config when saving. In addition, please use '

ValueError: Unable to restore custom object of type _tf_keras_metric currently. Please make sure that the layer implements get_configand from_config when saving. In addition, please use the custom_objects arg when calling load_model().
`

Every thing else runs smoothly.

@thias15
Copy link
Collaborator

thias15 commented Sep 18, 2020

Can you show the output of the model.fit? Did you change any of the metrics?

Please put code or error output between "```" at the beginning and end, so it is formated for easier reading. You can also use the <> button the comment field.

@Spilleren
Copy link
Author

Haven't changed anything.

My bad!

Epoch 1/10 664/666 [============================>.] - ETA: 0s - loss: 0.0354 - mean_absolute_error: 0.3235 - direction_metric: 0.7452 - angle_metric: 0.2609WARNING:tensorflow:From /home/bso/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py:1813: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0001.ckpt/assets 666/666 [==============================] - 36s 54ms/step - loss: 0.0354 - mean_absolute_error: 0.3231 - direction_metric: 0.7456 - angle_metric: 0.2614 - val_loss: 0.0378 - val_mean_absolute_error: 0.2368 - val_direction_metric: 0.6941 - val_angle_metric: 0.2679 Epoch 2/10 665/666 [============================>.] - ETA: 0s - loss: 0.0141 - mean_absolute_error: 0.2061 - direction_metric: 0.7992 - angle_metric: 0.3699INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0002.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0141 - mean_absolute_error: 0.2061 - direction_metric: 0.7993 - angle_metric: 0.3698 - val_loss: 0.0323 - val_mean_absolute_error: 0.2250 - val_direction_metric: 0.7267 - val_angle_metric: 0.3096 Epoch 3/10 665/666 [============================>.] - ETA: 0s - loss: 0.0120 - mean_absolute_error: 0.1963 - direction_metric: 0.8053 - angle_metric: 0.3964INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0003.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0120 - mean_absolute_error: 0.1963 - direction_metric: 0.8052 - angle_metric: 0.3964 - val_loss: 0.0313 - val_mean_absolute_error: 0.2178 - val_direction_metric: 0.7225 - val_angle_metric: 0.3053 Epoch 4/10 664/666 [============================>.] - ETA: 0s - loss: 0.0100 - mean_absolute_error: 0.1848 - direction_metric: 0.8147 - angle_metric: 0.4127INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0004.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0100 - mean_absolute_error: 0.1848 - direction_metric: 0.8142 - angle_metric: 0.4124 - val_loss: 0.0347 - val_mean_absolute_error: 0.2267 - val_direction_metric: 0.7118 - val_angle_metric: 0.3225 Epoch 5/10 665/666 [============================>.] - ETA: 0s - loss: 0.0093 - mean_absolute_error: 0.1814 - direction_metric: 0.8133 - angle_metric: 0.4215INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0005.ckpt/assets 666/666 [==============================] - 14s 20ms/step - loss: 0.0093 - mean_absolute_error: 0.1814 - direction_metric: 0.8133 - angle_metric: 0.4215 - val_loss: 0.0327 - val_mean_absolute_error: 0.2061 - val_direction_metric: 0.7230 - val_angle_metric: 0.3262 Epoch 6/10 665/666 [============================>.] - ETA: 0s - loss: 0.0085 - mean_absolute_error: 0.1778 - direction_metric: 0.8185 - angle_metric: 0.4393INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0006.ckpt/assets 666/666 [==============================] - 14s 21ms/step - loss: 0.0085 - mean_absolute_error: 0.1777 - direction_metric: 0.8187 - angle_metric: 0.4393 - val_loss: 0.0330 - val_mean_absolute_error: 0.2147 - val_direction_metric: 0.7070 - val_angle_metric: 0.3209 Epoch 7/10 665/666 [============================>.] - ETA: 0s - loss: 0.0083 - mean_absolute_error: 0.1772 - direction_metric: 0.8145 - angle_metric: 0.4416INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0007.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0083 - mean_absolute_error: 0.1771 - direction_metric: 0.8147 - angle_metric: 0.4415 - val_loss: 0.0314 - val_mean_absolute_error: 0.2252 - val_direction_metric: 0.7139 - val_angle_metric: 0.3278 Epoch 8/10 665/666 [============================>.] - ETA: 0s - loss: 0.0074 - mean_absolute_error: 0.1719 - direction_metric: 0.8180 - angle_metric: 0.4595INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0008.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0074 - mean_absolute_error: 0.1720 - direction_metric: 0.8181 - angle_metric: 0.4594 - val_loss: 0.0338 - val_mean_absolute_error: 0.2041 - val_direction_metric: 0.7160 - val_angle_metric: 0.3064 Epoch 9/10 665/666 [============================>.] - ETA: 0s - loss: 0.0070 - mean_absolute_error: 0.1680 - direction_metric: 0.8154 - angle_metric: 0.4677INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0009.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0070 - mean_absolute_error: 0.1680 - direction_metric: 0.8153 - angle_metric: 0.4677 - val_loss: 0.0316 - val_mean_absolute_error: 0.1991 - val_direction_metric: 0.7112 - val_angle_metric: 0.3112 Epoch 10/10 665/666 [============================>.] - ETA: 0s - loss: 0.0066 - mean_absolute_error: 0.1646 - direction_metric: 0.8239 - angle_metric: 0.4740INFO:tensorflow:Assets written to: models/my_openbot_pilot_net_lr0.0001_bz16_bn/checkpoints/cp-0010.ckpt/assets 666/666 [==============================] - 13s 20ms/step - loss: 0.0066 - mean_absolute_error: 0.1646 - direction_metric: 0.8238 - angle_metric: 0.4740 - val_loss: 0.0315 - val_mean_absolute_error: 0.2098 - val_direction_metric: 0.7321 - val_angle_metric: 0.3198

@thias15
Copy link
Collaborator

thias15 commented Sep 18, 2020

What's your tensorflow and conda version?

@Spilleren
Copy link
Author

conda is 4.8.3 and tensorflow is 2.2.0

@thias15
Copy link
Collaborator

thias15 commented Sep 18, 2020

Try removing the custom metrics in the first cell of the training section.
metric_list = ['MeanAbsoluteError', metrics.direction_metric, metrics.angle_metric]
to
metric_list = ['MeanAbsoluteError']

In the meantime I'll see if I can reproduce your error somehow.

@Spilleren
Copy link
Author

Hmmm... tried doing as you suggested and now i run into a new error (even if I set the custom metrics back).
When plotting the metrics, it gives a key error.

plt.plot(history.history['MeanAbsoluteError'], label='mean_absolute_error') plt.plot(history.history['val_MeanAbsoluteError'], label = 'val_mean_absolute_error') plt.xlabel('Epoch') plt.ylabel('Mean Absolute Error') plt.legend(loc='lower right') plt.savefig(os.path.join(log_path,'error.png'))

KeyError Traceback (most recent call last) <ipython-input-37-1e2ff90cb985> in <module> ----> 1 plt.plot(history.history['MeanAbsoluteError'], label='mean_absolute_error') 2 plt.plot(history.history['val_MeanAbsoluteError'], label = 'val_mean_absolute_error') 3 plt.xlabel('Epoch') 4 plt.ylabel('Mean Absolute Error') 5 plt.legend(loc='lower right')
KeyError: 'MeanAbsoluteError'

@thias15
Copy link
Collaborator

thias15 commented Sep 18, 2020

You need to restart the kernel and run the notebook again. The plots with the custom metrics won't work.

@Spilleren
Copy link
Author

Okay when running with just
metric_list = ['MeanAbsoluteError']

I receive the error
KeyError Traceback (most recent call last) <ipython-input-37-62c19ef5d999> in <module> ----> 1 best_index = np.argmax(np.array(history.history['val_angle_metric']) \ 2 + np.array(history.history['val_direction_metric'])) 3 best_checkpoint = str("cp-%04d.ckpt" % (best_index+1)) 4 best_model = utils.load_model(os.path.join(checkpoint_path,best_checkpoint),loss_fn,metric_list) 5 best_tflite = utils.generate_tflite(checkpoint_path, best_checkpoint)

KeyError: 'val_angle_metric'

When doing the best_index part.

@thias15
Copy link
Collaborator

thias15 commented Sep 18, 2020

Yes. Because you don't have the angle metric now. You need to replace it with val_mean_absolute_error

@thias15
Copy link
Collaborator

thias15 commented Sep 18, 2020

Can you try adding import metrics to the utils.py file?

@Spilleren
Copy link
Author

Spilleren commented Sep 21, 2020

Thanks for all the help! First time working with machine learning :)

Added import metrics to the utils.py file.
The error doesn't appear at best_index now, but has moved down to best_model.

<ipython-input-41-164fa3b8c5aa> in <module>
----> 1 best_model = utils.load_model(os.path.join(checkpoint_path,best_checkpoint),loss_fn,metric_list)
      2 test_loss, test_acc, test_dir, test_ang = best_model.evaluate(test_ds, steps=image_count_test/TEST_BATCH_SIZE, verbose=2)

~/git/OpenBot/policy/utils.py in load_model(model_path, loss_fn, metric_list)
     73 
     74 def load_model(model_path,loss_fn,metric_list):
---> 75     model = tf.keras.models.load_model(model_path,
     76     custom_objects=None,
     77     compile=False

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
    188     if isinstance(filepath, six.string_types):
    189       loader_impl.parse_saved_model(filepath)
--> 190       return saved_model_load.load(filepath, compile)
    191 
    192   raise IOError(

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in load(path, compile)
    114   # TODO(kathywu): Add saving/loading of optimizer, compiled losses and metrics.
    115   # TODO(kathywu): Add code to load from objects that contain all endpoints
--> 116   model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
    117 
    118   # pylint: disable=protected-access

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, loader_cls)
    600     object_graph_proto = meta_graph_def.object_graph_def
    601     with ops.init_scope():
--> 602       loader = loader_cls(object_graph_proto,
    603                           saved_model_proto,
    604                           export_dir)

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in __init__(self, *args, **kwargs)
    186     self._models_to_reconstruct = []
    187 
--> 188     super(KerasObjectLoader, self).__init__(*args, **kwargs)
    189 
    190     # Now that the node object has been fully loaded, and the checkpoint has

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in __init__(self, object_graph_proto, saved_model_proto, export_dir)
    121       self._concrete_functions[name] = _WrapperFunction(concrete_function)
    122 
--> 123     self._load_all()
    124     self._restore_checkpoint()
    125 

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_all(self)
    207     # loaded from config may create variables / other objects during
    208     # initialization. These are recorded in `_nodes_recreated_from_config`.
--> 209     self._layer_nodes = self._load_layers()
    210 
    211     # Load all other nodes and functions.

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_layers(self)
    310 
    311     for node_id, proto in metric_list:
--> 312       layers[node_id] = self._load_layer(proto.user_object, node_id)
    313     return layers
    314 

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_layer(self, proto, node_id)
    335     obj, setter = self._revive_from_config(proto.identifier, metadata, node_id)
    336     if obj is None:
--> 337       obj, setter = revive_custom_object(proto.identifier, metadata)
    338 
    339     # Add an attribute that stores the extra functions/objects saved in the

~/anaconda3/envs/openbot/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in revive_custom_object(identifier, metadata)
    776     return revived_cls._init_from_metadata(metadata)  # pylint: disable=protected-access
    777   else:
--> 778     raise ValueError('Unable to restore custom object of type {} currently. '
    779                      'Please make sure that the layer implements `get_config`'
    780                      'and `from_config` when saving. In addition, please use '

ValueError: Unable to restore custom object of type _tf_keras_metric currently. Please make sure that the layer implements `get_config`and `from_config` when saving. In addition, please use the `custom_objects` arg when calling `load_model()`.`````

@thias15
Copy link
Collaborator

thias15 commented Sep 21, 2020

Can you pass the two custom metrics with the custom_objects arg?

If this does not work, you can also try the following workaround.
Above the cell that throws the error, define this function.

def load_model(model_path):
    model = tf.keras.models.load_model(model_path,
    custom_objects=None,
    compile=False
    )
    model.compile(loss=loss_fn, 
                  metrics=metric_list)
    return model

In the cell that throws the error, replace

best_model = utils.load_model(os.path.join(checkpoint_path,best_checkpoint),loss_fn,metric_list)

with

best_model = load_model(os.path.join(checkpoint_path,best_checkpoint))

@Spilleren
Copy link
Author

Okay now things are cooking op nicely.

Redefined the function as you said, however i added custom_object as parameter.

def load_model(model_path, custom_object): model = tf.keras.models.load_model(model_path, custom_objects=custom_object, compile=False ) model.compile(loss=loss_fn, metrics=metric_list) return model

Then i could pass the custom metrics as a dict.

best_model = load_model(os.path.join(checkpoint_path,best_checkpoint),{'direction_metric':metrics.direction_metric, 'angle_metric':metrics.angle_metric},)

Guess I could have changed the load_model in utils.py the same way and have gotten the same result.

Thank you for the help! I will get started on teaching my robot some policies.

@thias15
Copy link
Collaborator

thias15 commented Sep 22, 2020

Did the function inside the notebook work without passing custom objects? This would indicate it's an issue with the scope. Yes, passing the dictionary of custom objects and leaving the code in util should also work.

Trying to figure out why you were facing the issue while others including myself don't. If we understand why, we can fix it for others that may come across this issue. Main reason why I prefer not to pass custom_object is in case people change the metric list. Then it's one more part in the code that needs to be changed. But maybe it makes sense to just pass all custom metrics, just in case. If ppl define new ones they will have to add them manually.

@Spilleren
Copy link
Author

Nope it didn't work without passing. Still getting the same error as before.

Ah yes makes sense. So far I've tried running the code on my laptop and my workstation, both result in the same error. Both running on Ubuntu 18.04.

@thias15
Copy link
Collaborator

thias15 commented Sep 23, 2020

Strange, I have been able to run it without problems on Ubuntu 18.04, also on Mac and Windows. Are you using a conda environment? If yes, what is the version of conda, tensorflow and jupyter notebook?

@Spilleren
Copy link
Author

Okay so i messed up my ubuntu partition yesterday and had to do a fresh install. Thought i would test out this one, before installing anything else. The error still occurred with a fresh install and fresh git clone.

I am running conda version 4.8.3
Tensorflow 2.2
jupyter notebook 6.0.3

@thias15
Copy link
Collaborator

thias15 commented Sep 24, 2020

I see. I'm using tf 2.0.0. Maybe this is a 'new feature'.

@thias15
Copy link
Collaborator

thias15 commented Sep 24, 2020

Could you do me a favour and try to install this environment:
conda create -n openbot_test python=3.7 tensorflow=2.0.0 notebook=6.1.1 matplotlib=3.3.1 pillow=7.2.0

@Spilleren
Copy link
Author

Yup it seems to work now. So it looks lige a "new feature"...

How ever it didn't use the gpu for the calculations as default, as it did with 2.2.
Is there a way to force it to use gpu?

@thias15
Copy link
Collaborator

thias15 commented Sep 25, 2020

If you want to use the GPU, you need to install tensorflow-gpu.

@thias15
Copy link
Collaborator

thias15 commented Sep 25, 2020

Yup it seems to work now. So it looks lige a "new feature"...

Great, I will add a note then for people that want to use newer versions of tensorflow.
It's not high priority, but if I get a chance, I may update the code and port it to tf2.2.
However, on Mac for example tf2.0 still gets installed by default.

@thias15 thias15 closed this as completed Sep 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants