Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGF Pod fails to become ready due to nginx reload failure: "failed to send the HUP signal to NGINX main: operation not permitted" #1695

Open
kate-osborn opened this issue Mar 14, 2024 · 1 comment
Labels
backlog Currently unprioritized work. May change with user feedback or as the product progresses. bug Something isn't working help wanted Extra attention is needed

Comments

@kate-osborn
Copy link
Contributor

Describe the bug
In some environments, the NGINX Gateway Fabric fails to report as ready. The nginx-gateway logs report an error reloading NGINX:

{"level":"error","ts":"2024-03-12T02:21:19Z","logger":"eventLoop.eventHandler","msg":"Failed to update NGINX configuration","batchID":1,"error":"failed to reload NGINX: failed to send the HUP signal to NGINX main: operation not permitted"

This is due to the control plane now having the proper permissions to reload NGINX.

Workaround

To resolve this issue you will need to set allowPrivilegeEscalation to true.

If using Helm, you can set the nginxGateway.securityContext.allowPrivilegeEscalation value.
If using the manifests directly, you can update this field under the nginx-gateway container’s securityContext.

Open Questions

  • So far we have been unable to reproduce this issue on kind or any managed Kubernetes platform. How can we reproduce?
  • What is the root cause of this permissions issue? Is there a cluster setting that can be tweaked?

Related issues:

@kate-osborn kate-osborn added bug Something isn't working help wanted Extra attention is needed labels Mar 14, 2024
@mpstefan mpstefan added the backlog Currently unprioritized work. May change with user feedback or as the product progresses. label Mar 25, 2024
@bjee19
Copy link
Contributor

bjee19 commented Jun 13, 2024

A possible way to create a similar error of : {"level":"error","ts":"2024-06-13T18:49:14Z","logger":"eventLoop.eventHandler","msg":"Failed to update NGINX configuration","batchID":16,"error":"failed to reload NGINX: reload unsuccessful: no new NGINX worker processes started for config version 5. Please check the NGINX container logs for possible configuration issues: context deadline exceeded","stacktrace":"github.com/nginxinc/nginx-gateway-fabric/internal/mode/static.(*eventHandlerImpl).HandleEventBatch\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/mode/static/handler.go:223\ngh.neting.cc/nginxinc/nginx-gateway-fabric/internal/framework/events.(*EventLoop).Start.func1.1\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/framework/events/loop.go:74"}

is by deploying on Openshift, deploying any example, deleting the resources, and waiting a little while. This is also fixed by setting allowPrivilegeEscalation to true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Currently unprioritized work. May change with user feedback or as the product progresses. bug Something isn't working help wanted Extra attention is needed
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants