NGF Pod fails to become ready due to nginx reload failure: "failed to send the HUP signal to NGINX main: operation not permitted" #1695

kate-osborn · 2024-03-14T18:34:00Z

Describe the bug
In some environments, the NGINX Gateway Fabric fails to report as ready. The nginx-gateway logs report an error reloading NGINX:

{"level":"error","ts":"2024-03-12T02:21:19Z","logger":"eventLoop.eventHandler","msg":"Failed to update NGINX configuration","batchID":1,"error":"failed to reload NGINX: failed to send the HUP signal to NGINX main: operation not permitted"

This is due to the control plane now having the proper permissions to reload NGINX.

Workaround

To resolve this issue you will need to set allowPrivilegeEscalation to true.

If using Helm, you can set the nginxGateway.securityContext.allowPrivilegeEscalation value.
If using the manifests directly, you can update this field under the nginx-gateway container’s securityContext.

Open Questions

So far we have been unable to reproduce this issue on kind or any managed Kubernetes platform. How can we reproduce?
What is the root cause of this permissions issue? Is there a cluster setting that can be tweaked?

Related issues:

The text was updated successfully, but these errors were encountered:

bjee19 · 2024-06-13T18:52:58Z

A possible way to create a similar error of : {"level":"error","ts":"2024-06-13T18:49:14Z","logger":"eventLoop.eventHandler","msg":"Failed to update NGINX configuration","batchID":16,"error":"failed to reload NGINX: reload unsuccessful: no new NGINX worker processes started for config version 5. Please check the NGINX container logs for possible configuration issues: context deadline exceeded","stacktrace":"github.com/nginxinc/nginx-gateway-fabric/internal/mode/static.(*eventHandlerImpl).HandleEventBatch\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/mode/static/handler.go:223\ngh.neting.cc/nginxinc/nginx-gateway-fabric/internal/framework/events.(*EventLoop).Start.func1.1\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/framework/events/loop.go:74"}

is by deploying on Openshift, deploying any example, deleting the resources, and waiting a little while. This is also fixed by setting allowPrivilegeEscalation to true.

kate-osborn added bug Something isn't working help wanted Extra attention is needed labels Mar 14, 2024

github-project-automation bot added this to NGINX Gateway Fabric Mar 14, 2024

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Mar 14, 2024

mpstefan added the backlog Currently unprioritized work. May change with user feedback or as the product progresses. label Mar 25, 2024

pleshakov mentioned this issue Jun 10, 2024

Release 1.3.0 #2113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGF Pod fails to become ready due to nginx reload failure: "failed to send the HUP signal to NGINX main: operation not permitted" #1695

NGF Pod fails to become ready due to nginx reload failure: "failed to send the HUP signal to NGINX main: operation not permitted" #1695

kate-osborn commented Mar 14, 2024

bjee19 commented Jun 13, 2024

NGF Pod fails to become ready due to nginx reload failure: "failed to send the HUP signal to NGINX main: operation not permitted" #1695

NGF Pod fails to become ready due to nginx reload failure: "failed to send the HUP signal to NGINX main: operation not permitted" #1695

Comments

kate-osborn commented Mar 14, 2024

bjee19 commented Jun 13, 2024