Ensure NKG shutdowns gracefully #563

ja20222 · 2023-04-05T23:18:48Z

Graceful = catch SIGTERM -> no errors in the logs -> return 0

Below is an example of a known problem:

How to reproduce:

Deploy cafe example.

Generate some updates in the k8s APIs - for example, scale coffee pods to 10 or back to 0.

At the same time, kill kubectl delete the NKG pod.

If you’re lucky (seriously, it depends on the timing of the events), you might get this:

{"level":"info","ts":1666645520.3277802,"msg":"The resource was not upserted because the context was canceled","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass","GatewayClass":{"name":"nginx"},"namespace":"","name":"nginx","reconcileID":"8b59d3c4-f023-4e28-bf4e-7fcef51d62e2"}
{"level":"info","ts":1666645520.3278348,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
{"level":"info","ts":1666645520.3278677,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":1666645520.327872,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass"}
{"level":"info","ts":1666645520.327875,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"gateway","controllerGroup":"gateway.networking.k8s.io","controllerKind":"Gateway"}
{"level":"info","ts":1666645520.3278775,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"httproute","controllerGroup":"gateway.networking.k8s.io","controllerKind":"HTTPRoute"}
{"level":"info","ts":1666645520.32788,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"endpointslice","controllerGroup":"discovery.k8s.io","controllerKind":"EndpointSlice"}
{"level":"error","ts":1666645520.3280203,"logger":"statusUpdater","msg":"Failed to update status","namespace":"default","name":"gateway","kind":"Gateway","error":"Put "https://10.96.0.1:443/apis/gateway.networking.k8s.io/v1beta1/namespaces/default/gateways/gateway/status?timeout=10s\": context canceled","stacktrace":"github.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:154\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).Update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:97\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventHandlerImpl).HandleEventBatch\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/handler.go:88\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventLoop).Start.func1.1\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/loop.go:61"}
{"level":"info","ts":1666645520.328146,"logger":"eventLoop","msg":"Finished handling the batch"}
{"level":"info","ts":1666645520.3281548,"msg":"All workers finished","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":1666645520.328158,"msg":"All workers finished","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
{"level":"info","ts":1666645520.3281612,"msg":"All workers finished","controller":"gateway","controllerGroup":"gateway.networking.k8s.io","controllerKind":"Gateway"}
{"level":"info","ts":1666645520.3281643,"msg":"All workers finished","controller":"httproute","controllerGroup":"gateway.networking.k8s.io","controllerKind":"HTTPRoute"}
{"level":"info","ts":1666645520.328167,"msg":"All workers finished","controller":"endpointslice","controllerGroup":"discovery.k8s.io","controllerKind":"EndpointSlice"}
{"level":"info","ts":1666645520.3281696,"msg":"All workers finished","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass"}
{"level":"info","ts":1666645520.3281732,"msg":"Stopping and waiting for caches"}
{"level":"info","ts":1666645520.328385,"msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":1666645520.328418,"msg":"Wait completed, proceeding to shutdown the manager"}
rpc error: code = NotFound desc = an error occurred when try to find container "c71dccbaa6c5ee3987461541bbafdaebf6a95b0e9c080029765e46eb92798c79": not found%

Note the error:

{"level":"error","ts":1666645520.3280203,"logger":"statusUpdater","msg":"Failed to update status","namespace":"default","name":"gateway","kind":"Gateway","error":"Put "https://10.96.0.1:443/apis/gateway.networking.k8s.io/v1beta1/namespaces/default/gateways/gateway/status?timeout=10s\": context canceled","stacktrace":"github.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:154\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).Update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:97\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventHandlerImpl).HandleEventBatch\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/handler.go:88\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventLoop).Start.func1.1\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/loop.go:61"}

It happened because the update status API call canceled.

Expected:

the graceful shutdown should be graceful - no errors should be printed in the logs.

mpstefan · 2023-08-30T16:24:22Z

Merged into #691

ja20222 added the bug Something isn't working label Apr 5, 2023

ja20222 added this to NGINX Gateway Fabric Apr 5, 2023

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Apr 5, 2023

pleshakov changed the title ~~Graceful shutdown might cause errors appear in the log~~ Ensure NGK shutdowns gracefully Apr 7, 2023

pleshakov added this to the v1.0.0 milestone Apr 7, 2023

pleshakov added the area/control-plane General control plane issues label Apr 7, 2023

mpstefan changed the title ~~Ensure NGK shutdowns gracefully~~ Ensure NKG shutdowns gracefully Jul 13, 2023

mpstefan mentioned this issue Jul 13, 2023

Document Upgrade Process for NKG #548

Closed

mpstefan closed this as completed Aug 30, 2023

github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Aug 30, 2023

mpstefan closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2023

mpstefan mentioned this issue Aug 30, 2023

Make status updater production ready #691

Closed

bjee19 mentioned this issue Sep 15, 2023

Implement Status Updater Retrying on Failures #1062

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure NKG shutdowns gracefully #563

Ensure NKG shutdowns gracefully #563

ja20222 commented Apr 5, 2023 •

edited by pleshakov

Loading

mpstefan commented Aug 30, 2023

Ensure NKG shutdowns gracefully #563

Ensure NKG shutdowns gracefully #563

Comments

ja20222 commented Apr 5, 2023 • edited by pleshakov Loading

mpstefan commented Aug 30, 2023

ja20222 commented Apr 5, 2023 •

edited by pleshakov

Loading