Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference fails when using finetuned models #7

Open
hofbi opened this issue Apr 22, 2020 · 0 comments
Open

Inference fails when using finetuned models #7

hofbi opened this issue Apr 22, 2020 · 0 comments

Comments

@hofbi
Copy link

hofbi commented Apr 22, 2020

We use you model as baseline and finetunded the model with our own data following the guidelines for training and finetuning. We tested our model following the guidelines for testing and in produces valid results.

However if we want to use our finetuned model for predicting attention maps on unseen data the inference fails with this error: NotFoundError (see above for traceback): Key encoder/Variable_1 not found in checkpoint.

Is this related to a wrong usage from our side or is this a bug in the code?

For completeness the entire stack trace of the failure

Convert frames to tf records...
/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

  0%|          | 0/33 [00:00<?, ?it/s]
100%|##########| 33/33 [00:17<00:00,  1.89it/s]

  0%|          | 0/33 [00:00<?, ?it/s]
100%|##########| 33/33 [00:15<00:00,  2.07it/s]
No. of /tmp/data/inference videos: 66

Generate ROI predictions...
/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
INFO:tensorflow:Using config: {'_log_step_count_steps': 10, '_is_chief': True, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_task_type': 'worker', '_tf_random_seed': None, '_num_worker_replicas': 1, '_model_dir': '/tmp/weights/finetuned/', '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f78493e7080>, '_session_config': None, '_master': '', '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_keep_checkpoint_max': 5, '_save_summary_steps': inf}
2020-04-22 14:47:05.293285: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-04-22 14:47:05.384798: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-22 14:47:05.385236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.62GiB
2020-04-22 14:47:05.385257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-04-22 14:47:05.962379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from /tmp/weights/finetuned/best_ckpt/model.ckpt-3133
2020-04-22 14:47:06.014079: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_1 not found in checkpoint
2020-04-22 14:47:06.014691: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable not found in checkpoint
2020-04-22 14:47:06.015068: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_2 not found in checkpoint
2020-04-22 14:47:06.015023: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_4 not found in checkpoint
2020-04-22 14:47:06.015783: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_5 not found in checkpoint
2020-04-22 14:47:06.015942: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_6 not found in checkpoint
2020-04-22 14:47:06.016060: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_7 not found in checkpoint
2020-04-22 14:47:06.016010: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_3 not found in checkpoint
2020-04-22 14:47:06.016549: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_9 not found in checkpoint
2020-04-22 14:47:06.017427: W tensorflow/core/framework/op_kernel.cc:1198] Not found: Key encoder/Variable_8 not found in checkpoint
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key encoder/Variable_1 not found in checkpoint
	 [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]
	 [[Node: save/RestoreV2_13/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2_13", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "infer.py", line 209, in <module>
    main(argv=sys.argv)
  File "infer.py", line 185, in main
    for res in predict_generator:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 430, in predict
    hooks=input_hooks + hooks) as mon_sess:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 787, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 511, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 972, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 977, in _create_session
    return self._sess_creator.create_session()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 668, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 440, in create_session
    init_fn=self._scaffold.init_fn)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
    config=config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 189, in _restore_checkpoint
    saver.restore(sess, checkpoint_filename_with_path)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1686, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key encoder/Variable_1 not found in checkpoint
	 [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]
	 [[Node: save/RestoreV2_13/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2_13", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'save/RestoreV2_5', defined at:
  File "infer.py", line 209, in <module>
    main(argv=sys.argv)
  File "infer.py", line 185, in main
    for res in predict_generator:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 430, in predict
    hooks=input_hooks + hooks) as mon_sess:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 787, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 511, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 972, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 977, in _create_session
    return self._sess_creator.create_session()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 668, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 431, in create_session
    self._scaffold.finalize()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 210, in finalize
    self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 821, in _get_saver_or_default
    saver = Saver(sharded=True, allow_empty=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1239, in __init__
    self.build()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1248, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1284, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 471, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 268, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1031, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key encoder/Variable_1 not found in checkpoint
	 [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]
	 [[Node: save/RestoreV2_13/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2_13", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant