Support restarting Nomad without restarting nspawn containers #17

mateuszlewko · 2020-10-27T23:21:35Z

It seems that restarting Nomad service (for example when upgrading Nomad or reloading configuration) restarts jobs run by nspawn driver. Docker jobs stay alive and are not restarted when restarting Nomad.

I observed the following errors in logs:

2020-10-26T19:28:26.592+0100 [ERROR] client.alloc_runner.task_runner: error recovering task; cleaning up: alloc_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5 task= error="rpc error: code = Unknown desc = failed to decode driver config: EOF" task_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5//0af76d99
2020-10-26T19:28:26.592+0100 [WARN] client.alloc_runner.task_runner: error destroying unrecoverable task: alloc_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5 task= error="rpc error: code = Unknown desc = task not found for given id" task_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5//0af76d99

Failed jobs are then reallocated and run fine, however, it's undesirable that they are restarted.
Would it be hard to support that?

JanMa · 2020-10-28T07:21:47Z

Hello @mateuszlewko ,
normally it should be fine to restart Nomad and tasks started via this driver should also keep running.
I suspect you are running into an issue which I fixed in 9a578ff. Can you please update the driver to the latest release 0.4.0 and check if it is still not working?

If not, this is definitely a bug and I'll fix it!
Kind regards,
Jan

JanMa · 2020-10-28T12:08:55Z

I double-checked it and it seems the issue is also present in the latest version. I will make sure to fix this 👍

When `RecoverTask` is called we initially tried to recover the `TaskConfig` for a given task. This was blindly copied from the nomad-driver-skeleton project and it turns out we make no use of it at all. Since this also caused Issue #17, we simply get rid of it. Recovering tasks when a Nomad client is restarted now works again.

JanMa · 2020-10-29T12:02:51Z

@mateuszlewko I just published a new release 0.4.1 which contains a fix for this issue

mateuszlewko · 2020-10-31T19:43:36Z

Confirmed that it's working now. Thank you!

JanMa mentioned this issue Oct 29, 2020

Fix task recovery issues and stop all test containers #18

Merged

JanMa closed this as completed in #18 Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support restarting Nomad without restarting nspawn containers #17

Support restarting Nomad without restarting nspawn containers #17

mateuszlewko commented Oct 27, 2020

JanMa commented Oct 28, 2020

JanMa commented Oct 28, 2020

JanMa commented Oct 29, 2020

mateuszlewko commented Oct 31, 2020

Support restarting Nomad without restarting nspawn containers #17

Support restarting Nomad without restarting nspawn containers #17

Comments

mateuszlewko commented Oct 27, 2020

JanMa commented Oct 28, 2020

JanMa commented Oct 28, 2020

JanMa commented Oct 29, 2020

mateuszlewko commented Oct 31, 2020