Add a validating webhook for ingress sanity check #3802

tjamet · 2019-02-22T23:00:31Z

What this PR does / why we need it:

Add a feature to reject the admission of ingresses that generates invalid configuration

Running nginx ingress controller in production for a while, we are pretty happy about it, but we occasionally have an issue where an ingress with a corrupted configuration is injected, and prevents from any other future update of the ingress nginx configuration.
Subsequently, we need to find out the ingress causing problems and fix it.
This PR increases the feedback loop by allowing, optionally, to reject the ingress, before it gets inserted and hence preserves the sanity of the configuration.

Once the patch is applied, and the controller configured, this is the user experience in case of an invalid ingress:

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
fixes #2833
fixes #3218
fixes #3435
fixes #3459

Special notes for your reviewer:
I am aware that this PR needs testing and documentation.
In case the option of adding this feature is accepted, I will as well contribute documentation and tests.

aledbf · 2019-02-22T23:09:46Z

@tjamet first, thank you for working on this!
That said:

please rebase
to update the dependencies please run:
- rm -rf images/grpc-fortune-teller/ (you can restore that before the commit)
- make dep-ensure

aledbf · 2019-02-22T23:11:39Z

@tjamet as you said, we need tests for this :)
What's the behavior when you have multiple deployments with different classes?

aledbf · 2019-02-22T23:23:10Z

internal/ingress/controller/controller.go

+	ings := n.store.ListIngresses()
+	ings = append(ings, &ingress.Ingress{*ing, annotations.NewAnnotationExtractor(n.store).Extract(ing)})
+
+	// sort Ingresses using the CreationTimestamp field


This is not required because ListIngresses alredy sorts the ingresses

tjamet · 2019-02-22T23:23:49Z

@aledbf thanks for the fast feedback
I am currently rebasing

Indeed I will then work on tests and documentation!
Regarding multiple deployments, the expected behavior is to accept ingresses that do not match the class. This said, I need further tests with multiple deployments

aledbf · 2019-02-22T23:28:55Z

Regarding multiple deployments, the expected behavior is to accept ingresses that do not [match the class](c07f553#diff-
86ac9ff9d75a0c5005c116e766a6127dR203).

Yes, I saw that but I am not sure about the webhook behavior with multiple deployments

fejta-bot · 2019-02-22T23:37:01Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

ElvinEfendi · 2019-02-23T02:05:14Z

Also fixes #3588

ElvinEfendi · 2019-02-23T02:09:38Z

@tjamet thanks for working on this! It'd be great if you describe your approach, discuss how much overhead this is adding to the core event loop, and why would I not want to have this always on.

tjamet · 2019-02-25T15:51:53Z

@aledbf I am seeing so far 2 parameters that can change the set of ingresses considered for configuration generation in a given controller:

--ingress-class
--watch-namespace
I am hence suspecting multiple deployments to have 4 different ways to consider or not an ingress.

Following your comment, I added the filter based on the namespace name as well for to behave consistently with the deployment. This should handle all combinations of --ingress-class and --watch-namespace, as soon as the validating webhook is enabled for all deployments.

The approach @ElvinEfendi is pretty simple in the end, it consists of exposing, outside of the core event loop an HTTP handler, compliant with Kubernetes validating webhook interface, to ensure any ingress that is about to be inserted in the cluster would generate a syntactically valid configuration together with the rest of the known ingresses using the pre-existing features. This handler should be exposed through a service to a ValidatingWebhookConfiguration object.

As this handler is run outside of the core event loop, I have tested a benchmark that shows the following results (reported by go test -bench '.' -benchmem -run Benchmark -v within the ./bin/go-in-docker.sh):

number of pre existing ingresses	(go) memory used per request	time per validating request
0	136342 B/op	25926680 ns/op
20	637230 B/op	26366018 ns/op
50	1415960 B/op	38304690 ns/op
100	2587819 B/op	47551526 ns/op
200	5566906 B/op	71309565 ns/op
500	12790632 B/op	128302310 ns/op
1000	28805841 B/op	226466720 ns/op

Because the request is served through a service to using the ingress controller as a backend, webhook requests are load-balanced through all ingress controllers and hence reduces the load on a single pod in case multiple ingresses are inserted at the same time.

Another option to get better performances could be to drop the test with all pre-existing ingresses, which could probably be good enough to detect most configuration errors (syntax errors mainly).

Regarding the default enabling of the feature, I would tend to, as a system operator, to recommend to enable it systematically.
Although, it looks to me difficult to enable it by default directly after implementation. This would change the permissions the ingress controller would require to grant it system wide privileges to inject the ValidatingWebhookConfiguration object.
Also the way ValidatingWebhook works, they require a self signed certificate generated for the correct <service>.<namespace>.svc domain, which would be more difficult to integrate out of the box inside the ingress controller. My approach would rather be to make it easy in the chart to use such a feature to start with. Happy to read other opinions or concerns

aledbf · 2019-03-05T21:20:23Z

This should handle all combinations of --ingress-class and --watch-namespace, as soon as the validating webhook is enabled for all deployments.

👍

Regarding the default enabling of the feature, I would tend to, as a system operator, to recommend to enable it systematically.

This must be optional (at least for now)

Also the way ValidatingWebhook works, they require a self signed certificate generated for the correct ..svc domain, which would be more difficult to integrate out of the box inside the ingress controller. My approach would rather be to make it easy in the chart to use such a feature to start with. Happy to read other opinions or concerns

We can provide a script like https://gist.github.com/aledbf/11b3b4f882c632d021532edda8a5dc52 (I need to cleanup that) to use the feature already available in k8s to generate such certificate

tjamet · 2019-03-09T22:25:52Z

We can provide a script like https://gist.github.com/aledbf/11b3b4f882c632d021532edda8a5dc52 (I need to cleanup that) to use the feature already available in k8s to generate such certificate

Sure!
I can add it to this PR
We have it deployed using helm, using:

{{- $cn := printf "%s.%s.svc" ( include "nginx-ingress.validatingWebhook.fullname" . ) .Release.Namespace }}
{{- $ca := genCA (printf "%s-ca" ( include "nginx-ingress.validatingWebhook.fullname" . )) .Values.validatingWebhook.certificateValidity -}}
{{- $cert := genSignedCert $cn nil nil .Values.validatingWebhook.certificateValidity $ca -}}

This could also be integrated afterwards to the helm chart, that I could contribute to as well

Regarding impact on resources, here are graphs, with the deployment in the middle.
The previous version was 0.19.0

zq-david-wang · 2019-03-26T02:13:45Z

Just a silly question, would this validating webhook reject ingress resource that are valid for ingress controller of other kind? For example, the regular expression syntax for traefik is weird and would cause nginx ingress controller crash, if I deploy nginx ingress controller with this validating webhook, can I use traefik within the same cluster? (To be honest, the metrics endpoint for traefik is way more useful than nginx's)

tjamet · 2019-04-04T13:23:40Z

Hi @zq-david-wang

would this validating webhook reject ingress resource that are valid for ingress controller of other kind

The validating webhook is filtering on the kubernetes.io/ingress.class, ingress resources that match a different class will be accepted.
In case you run multiple ingress controllers (we do run several nginx ingress controllers) in a cluster, I would expect to find controllers deployed with different classes, hence ingresses for different ingresses kind (traefik in the case you are talking about) will be accepted.

In case both ingresses are run with the same class, both ingress controllers would generate a configuration for the ingress and hence the validating webhook will check the validity of the generated configuration

tjamet · 2019-04-11T08:18:54Z

Hi,
I think I addressed the concerns, are there any other one I need to address? Or is it good enough for a first version?
I will be happy to keep having a look at it and improve it with other ideas later on

aledbf · 2019-04-15T19:08:14Z

@tjamet please rebase and squash the commits

tjamet · 2019-04-15T20:13:06Z

Rebased and commits squashed

aledbf · 2019-04-18T02:47:12Z

@tjamet please rebase. After that, I plan to build a dev image to test and merge this PR

In case some ingress have a syntax error in the snippet configuration, the freshly generated configuration will not be reloaded to prevent tearing down existing rules. Although, once inserted, this configuration is preventing from any other valid configuration to be inserted as it remains in the ingresses of the cluster. To solve this problem, implement an optional validation webhook that simulates the addition of the ingress to be added together with the rest of ingresses. In case the generated configuration is not validated by nginx, deny the insertion of the ingress. In case certificates are mounted using kubernetes secrets, when those changes, keys are automatically updated in the container volume, and the controller reloads it using the filewatcher. Related changes: - Update vendors - Extract useful functions to check configuration with an additional ingress - Update documentation for validating webhook - Add validating webhook examples - Add a metric for each syntax check success and errors - Add more certificate generation examples

ElvinEfendi · 2019-05-02T14:50:20Z

/lgtm

k8s-ci-robot · 2019-05-02T14:50:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ElvinEfendi, tjamet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ElvinEfendi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ElvinEfendi · 2019-05-02T14:55:14Z

Thanks @tjamet!

towolf · 2019-05-05T19:50:48Z

This is really cool. Thanks @tjamet.

k8s-ci-robot requested review from aledbf and apelisse February 22, 2019 23:00

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 22, 2019

aledbf assigned aledbf and ElvinEfendi Feb 22, 2019

aledbf reviewed Feb 22, 2019

View reviewed changes

aledbf added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2019

tjamet force-pushed the admission-controller branch 2 times, most recently from 40a177d to da477d0 Compare February 24, 2019 14:45

tjamet force-pushed the admission-controller branch from da477d0 to 61f6d32 Compare February 27, 2019 15:19

tjamet mentioned this pull request Mar 4, 2019

Add missing basic usage documentation link #3847

Merged

tjamet force-pushed the admission-controller branch from 61f6d32 to a0a4015 Compare March 4, 2019 21:09

tjamet force-pushed the admission-controller branch 2 times, most recently from 50dd1ab to fd69b0b Compare March 19, 2019 15:58

tjamet force-pushed the admission-controller branch from fd69b0b to 21efe30 Compare April 4, 2019 13:12

aledbf mentioned this pull request Apr 4, 2019

A single app should not be able to break ingerss-nginx for the whole cluster #3588

Closed

tjamet force-pushed the admission-controller branch from 21efe30 to 5e6478c Compare April 15, 2019 20:11

aledbf removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2019

aledbf mentioned this pull request Apr 17, 2019

Health check error #3993

Closed

aledbf mentioned this pull request Apr 18, 2019

Execute lint plugins at the start to log potencial issues #4025

Closed

tjamet force-pushed the admission-controller branch from 5e6478c to 1cd17cd Compare April 18, 2019 17:07

aledbf mentioned this pull request Apr 29, 2019

Validate input from annotations/configmaps #4047

Closed

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2019

k8s-ci-robot merged commit b4f2880 into kubernetes:master May 2, 2019

tjamet deleted the admission-controller branch May 3, 2019 08:14

cxuu mentioned this pull request May 9, 2019

Invalid annotation causes ingress controller fails to start and goes to CrashLoopBackOff #2833

Closed

LucasBoisserie mentioned this pull request Sep 18, 2019

[stable/nginx-ingress] Add Validation webhook for nginx ingress controller helm/charts#17230

Merged

4 tasks

vijay-veeranki mentioned this pull request Oct 10, 2019

Apply Validating webhook (admission controller) for ingress-controller. ministryofjustice/cloud-platform#1298

Closed

towolf mentioned this pull request Jun 21, 2020

Update ValidatingWebhook for Ingress to support --dry-run=server #5752

Merged

8 tasks

bysph mentioned this pull request Mar 13, 2024

Performance Issue: Ingress creation in a cluster with many existing ingresses get slow admission webhook response #11115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a validating webhook for ingress sanity check #3802

Add a validating webhook for ingress sanity check #3802

tjamet commented Feb 22, 2019

aledbf commented Feb 22, 2019

aledbf commented Feb 22, 2019

aledbf Feb 22, 2019

tjamet commented Feb 22, 2019

aledbf commented Feb 22, 2019

fejta-bot commented Feb 22, 2019

ElvinEfendi commented Feb 23, 2019

ElvinEfendi commented Feb 23, 2019

tjamet commented Feb 25, 2019

aledbf commented Mar 5, 2019

tjamet commented Mar 9, 2019

zq-david-wang commented Mar 26, 2019

tjamet commented Apr 4, 2019

tjamet commented Apr 11, 2019

aledbf commented Apr 15, 2019

tjamet commented Apr 15, 2019

aledbf commented Apr 18, 2019

ElvinEfendi commented May 2, 2019

k8s-ci-robot commented May 2, 2019

ElvinEfendi commented May 2, 2019

towolf commented May 5, 2019

Add a validating webhook for ingress sanity check #3802

Add a validating webhook for ingress sanity check #3802

Conversation

tjamet commented Feb 22, 2019

aledbf commented Feb 22, 2019

aledbf commented Feb 22, 2019

aledbf Feb 22, 2019

Choose a reason for hiding this comment

tjamet commented Feb 22, 2019

aledbf commented Feb 22, 2019

fejta-bot commented Feb 22, 2019

ElvinEfendi commented Feb 23, 2019

ElvinEfendi commented Feb 23, 2019

tjamet commented Feb 25, 2019

aledbf commented Mar 5, 2019

tjamet commented Mar 9, 2019

zq-david-wang commented Mar 26, 2019

tjamet commented Apr 4, 2019

tjamet commented Apr 11, 2019

aledbf commented Apr 15, 2019

tjamet commented Apr 15, 2019

aledbf commented Apr 18, 2019

ElvinEfendi commented May 2, 2019

k8s-ci-robot commented May 2, 2019

ElvinEfendi commented May 2, 2019

towolf commented May 5, 2019