Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure NKG shutdowns gracefully #563

Closed
ja20222 opened this issue Apr 5, 2023 · 1 comment
Closed

Ensure NKG shutdowns gracefully #563

ja20222 opened this issue Apr 5, 2023 · 1 comment
Labels
area/control-plane General control plane issues bug Something isn't working
Milestone

Comments

@ja20222
Copy link

ja20222 commented Apr 5, 2023

Graceful = catch SIGTERM -> no errors in the logs -> return 0

Below is an example of a known problem:

How to reproduce:

Deploy cafe example.

Generate some updates in the k8s APIs - for example, scale coffee pods to 10 or back to 0.

At the same time, kill kubectl delete the NKG pod.

If you’re lucky (seriously, it depends on the timing of the events), you might get this:

{"level":"info","ts":1666645520.3277802,"msg":"The resource was not upserted because the context was canceled","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass","GatewayClass":{"name":"nginx"},"namespace":"","name":"nginx","reconcileID":"8b59d3c4-f023-4e28-bf4e-7fcef51d62e2"}
{"level":"info","ts":1666645520.3278348,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
{"level":"info","ts":1666645520.3278677,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":1666645520.327872,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass"}
{"level":"info","ts":1666645520.327875,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"gateway","controllerGroup":"gateway.networking.k8s.io","controllerKind":"Gateway"}
{"level":"info","ts":1666645520.3278775,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"httproute","controllerGroup":"gateway.networking.k8s.io","controllerKind":"HTTPRoute"}
{"level":"info","ts":1666645520.32788,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"endpointslice","controllerGroup":"discovery.k8s.io","controllerKind":"EndpointSlice"}
{"level":"error","ts":1666645520.3280203,"logger":"statusUpdater","msg":"Failed to update status","namespace":"default","name":"gateway","kind":"Gateway","error":"Put "https://10.96.0.1:443/apis/gateway.networking.k8s.io/v1beta1/namespaces/default/gateways/gateway/status?timeout=10s\": context canceled","stacktrace":"github.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:154\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).Update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:97\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventHandlerImpl).HandleEventBatch\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/handler.go:88\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventLoop).Start.func1.1\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/loop.go:61"}
{"level":"info","ts":1666645520.328146,"logger":"eventLoop","msg":"Finished handling the batch"}
{"level":"info","ts":1666645520.3281548,"msg":"All workers finished","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":1666645520.328158,"msg":"All workers finished","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
{"level":"info","ts":1666645520.3281612,"msg":"All workers finished","controller":"gateway","controllerGroup":"gateway.networking.k8s.io","controllerKind":"Gateway"}
{"level":"info","ts":1666645520.3281643,"msg":"All workers finished","controller":"httproute","controllerGroup":"gateway.networking.k8s.io","controllerKind":"HTTPRoute"}
{"level":"info","ts":1666645520.328167,"msg":"All workers finished","controller":"endpointslice","controllerGroup":"discovery.k8s.io","controllerKind":"EndpointSlice"}
{"level":"info","ts":1666645520.3281696,"msg":"All workers finished","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass"}
{"level":"info","ts":1666645520.3281732,"msg":"Stopping and waiting for caches"}
{"level":"info","ts":1666645520.328385,"msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":1666645520.328418,"msg":"Wait completed, proceeding to shutdown the manager"}
rpc error: code = NotFound desc = an error occurred when try to find container "c71dccbaa6c5ee3987461541bbafdaebf6a95b0e9c080029765e46eb92798c79": not found%

Note the error:

{"level":"error","ts":1666645520.3280203,"logger":"statusUpdater","msg":"Failed to update status","namespace":"default","name":"gateway","kind":"Gateway","error":"Put "https://10.96.0.1:443/apis/gateway.networking.k8s.io/v1beta1/namespaces/default/gateways/gateway/status?timeout=10s\": context canceled","stacktrace":"github.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:154\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).Update\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:97\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventHandlerImpl).HandleEventBatch\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/handler.go:88\ngh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventLoop).Start.func1.1\n\tgh.neting.cc/nginxinc/nginx-kubernetes-gateway/internal/events/loop.go:61"}

It happened because the update status API call canceled.

Expected:

the graceful shutdown should be graceful - no errors should be printed in the logs.

@ja20222 ja20222 added the bug Something isn't working label Apr 5, 2023
@pleshakov pleshakov changed the title Graceful shutdown might cause errors appear in the log Ensure NGK shutdowns gracefully Apr 7, 2023
@pleshakov pleshakov added this to the v1.0.0 milestone Apr 7, 2023
@pleshakov pleshakov added the area/control-plane General control plane issues label Apr 7, 2023
@mpstefan mpstefan changed the title Ensure NGK shutdowns gracefully Ensure NKG shutdowns gracefully Jul 13, 2023
@mpstefan
Copy link
Collaborator

Merged into #691

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Aug 30, 2023
@mpstefan mpstefan closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane General control plane issues bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants