Improve configuration change detection #2656

aledbf · 2018-06-17T16:13:07Z

What this PR does / why we need it:

Introduce additional information (checksum) to the configuration to allow determine if there was a change or not. Until now a change in the configuration configmap only triggered a reload after a change in some Ingress.

Which issue this PR fixes:

fixes #2567

k8s-ci-robot · 2018-06-17T16:13:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [aledbf]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov-io · 2018-06-17T17:20:07Z

Codecov Report

Merging #2656 into master will increase coverage by 0.11%.
The diff coverage is 40.74%.

@@            Coverage Diff             @@
##           master    #2656      +/-   ##
==========================================
+ Coverage   40.82%   40.93%   +0.11%     
==========================================
  Files          72       72              
  Lines        5078     5088      +10     
==========================================
+ Hits         2073     2083      +10     
+ Misses       2723     2721       -2     
- Partials      282      284       +2

Impacted Files	Coverage Δ
internal/ingress/controller/config/config.go	`98.23% <ø> (ø)`	⬆️
internal/ingress/types_equals.go	`12.33% <0%> (-0.09%)`	⬇️
internal/ingress/controller/nginx.go	`11.51% <0%> (-0.03%)`	⬇️
internal/ingress/controller/controller.go	`2.24% <0%> (+0.02%)`	⬆️
internal/ingress/controller/template/configmap.go	`77.39% <66.66%> (-0.6%)`	⬇️
internal/task/queue.go	`77.92% <70%> (-1.49%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6978a8...0abbd2b. Read the comment docs.

ElvinEfendi · 2018-06-18T07:26:05Z

Until now a change in the configuration configmap only triggered a reload after a change in some Ingress.

What about forcereload we currently have? Is it not supposed to cover this case?

ElvinEfendi · 2018-06-18T07:35:09Z

internal/ingress/status/status.go

@@ -185,7 +185,7 @@ func NewStatusSyncer(config Config) Sync {
 			go st.syncQueue.Run(time.Second, stop)
 			wait.PollUntil(updateInterval, func() (bool, error) {
 				// send a dummy object to the queue to force a sync
-				st.syncQueue.Enqueue("sync status")


why do we have to reload/regenerate Nginx config when controller leader change?

This is unnecessary now. Removed.

ElvinEfendi · 2018-06-18T07:35:46Z

internal/ingress/status/status.go

@@ -185,7 +185,7 @@ func NewStatusSyncer(config Config) Sync {
 			go st.syncQueue.Run(time.Second, stop)
 			wait.PollUntil(updateInterval, func() (bool, error) {
 				// send a dummy object to the queue to force a sync
-				st.syncQueue.Enqueue("sync status")


What's happening currently here? Runtime error because of an incorrect type?

I see that it is an interface, so eventually it will get passed to cache.DeletionHandlingMetaNamespaceKeyFunc to generate a key. And https://github.com/kubernetes/client-go/blob/8aceb98010c1c18b6b54a35b52fd5b46905e3d7f/tools/cache/store.go#L77 will make sure the key is "sync status"

ElvinEfendi · 2018-06-18T07:45:56Z

internal/ingress/controller/nginx.go

@@ -153,7 +152,7 @@ Error loading new template: %v

 		n.t = template
 		glog.Info("New NGINX configuration template loaded.")
-		n.SetForceReload(true)
+		n.syncQueue.Enqueue(task.GetDummyObject("template-change"), false)


for better readability should we create helper functions for these two operations? i.e something like

enqueueTask(obj) enqueueSkippableTask(obj)

Good point. Done.

ElvinEfendi

I like the changes in the PR 👍

But I could not figure out what's the edge case that existing forceReload functionality does not cover but this does. Can you elaborate?

aledbf · 2018-06-18T13:13:14Z

What about forcereload we currently have? Is it not supposed to cover this case?

The force reload approach works only if we can enqueue objects. Without the new parameter in enqueue, a change in the configmap could be skipped. Also, to reach the force reload validation the model must be different. This is not true if there is no change in Ingress rules, that's the reason for the new checksum field in the configuration type.

ElvinEfendi · 2018-06-19T07:33:06Z

The force reload approach works only if we can enqueue objects

When store.ConfigurationEvent happens at https://github.com/kubernetes/ingress-nginx/pull/2656/files?utf8=%E2%9C%93&diff=unified#diff-cde3fffe2425ad7efaa8add1d05ae2c0R310 we setForceReload and then enqueue that event object at https://github.com/kubernetes/ingress-nginx/pull/2656/files?utf8=%E2%9C%93&diff=unified#diff-cde3fffe2425ad7efaa8add1d05ae2c0L317. So in case of configuration update, this requirement is met.

Without the new parameter in enqueue, a change in the configmap could be skipped.

In case of configmap, how can this happen? In the code I see if t.lastSync > item.Timestamp {. Why would that not be enough specifically for configmap events, but it is enough for Ingress/Endpoints events?

Also, to reach the force reload validation the model must be different

Could you elaborate? 🤔 from what I read, force reload forces the sync method to always execute n.OnUpdate(pcfg) regardless of the model difference.

ElvinEfendi · 2018-06-19T07:36:11Z

internal/task/queue.go

-		Timestamp: ts,
+		Key:         key,
+		Timestamp:   ts,
+		IsSkippable: skippable,


Can we piggyback on Timestamp field to implement "skippable" logic? For example if we want to enqueue an event that's not skippable we can set its Timestamp to current time + an hour. This way t.lastSync > item.Timestamp will not hold, and worker won't skip the event.

ElvinEfendi · 2018-06-19T07:37:27Z

internal/ingress/controller/nginx.go

@@ -311,10 +308,11 @@ func (n *NGINXController) Start() {
 			if evt, ok := event.(store.Event); ok {
 				glog.V(3).Infof("Event %v received - object %v", evt.Type, evt.Obj)
 				if evt.Type == store.ConfigurationEvent {


What's special with this event type? Why can we not simply enqueue a skippable task?

Because event could be en endpoint, service, ingress, secret or configmap but only on a change in a configmap should escape the enqueue skippable logic

but only on a change in a configmap should escape the enqueue skippable logic

This is the part I don't understand completely. My current understanding why we have to force reload in this case is because in syncIngress function the model we use currently does not include this configmap data, and therefore when the change is only about configmap it does not regenerate the Nginx configuration.

But in this PR you are adding a new field to that model (ConfigurationChecksum) which to my understanding means we don't need this special case anymore when event type is ConfigurationEvent.

aledbf · 2018-06-19T12:05:54Z

Could you elaborate? thinking from what I read, force reload forces the sync method to always execute n.OnUpdate(pcfg) regardless of the model difference.

To reach the syncIngress function an item must be processed by the syncQueue. In this queue we use a time window to handle changes using a batch approach to avoid multiple reloads. This means in some scenarios, like simultaneous updates, we could skip a reload. This means we would never reach the isForceReload check.

aledbf · 2018-06-19T12:06:48Z

@ElvinEfendi you can tests this easilly following this example #2567 (comment)

ElvinEfendi · 2018-06-20T11:06:51Z

@ElvinEfendi you can tests this easilly following this example #2567 (comment)

@aledbf as you can see in that example, the controller logs

I0612 14:43:25.298234       6 event.go:218] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"nginx", Name:"nginx-configuration", UID:"75927380-6e41-11e8-86c0-025000000001", APIVersion:"v1", ResourceVersion:"82157", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap nginx/nginx-configuration
I0612 14:43:25.304823       6 controller.go:168] backend reload required
I0612 14:43:25.478106       6 controller.go:178] ingress backend successfully reloaded...

which comes from syncIngress. Does this not tell you that isForceReload check worked as expected?

aledbf · 2018-06-20T11:12:04Z

which comes from syncIngress. Does this not tell you that isForceReload check worked as expected?

How are you using for the test, this PR or 0.15.0?
(with 0.15.0 even with setforcereload, the reload doesn't happen)

ElvinEfendi · 2018-06-20T11:15:04Z

@aledbf I have not run the test myself, that's just the logs from @antoineco's test in the comment, does it matter though?

with 0.15.0 even with setforcereload, the reload doesn't happen

Can this be a race issue? I see that https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/store/store.go#L178 is not synchronized and it is being written from one goroutine (https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/store/store.go#L700) and red from another goroutine (https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/nginx.go#L427)

And you confirmed that this PR fixes the issue?

aledbf · 2018-06-20T11:17:40Z

Can this be a race issue?

No, but I will add a mutex to avoid this from happening.

aledbf · 2018-06-20T11:18:38Z

I have not run the test myself, that's just the logs from @antoineco's test in the comment.

If you check the generated nginx.con you will see that the reload didn't change the configuration file.

ElvinEfendi · 2018-06-20T12:24:53Z

I tried to test this branch but I'm getting

E0620 12:20:44.485143       5 controller.go:174] Unexpected failure reloading the backend:
template: nginx.tmpl:10:33: executing "nginx.tmpl" at <$all.Cfg.Checksum>: can't evaluate field Checksum in type config.Configuration

Also, this issue #2567 seems to be specifically about whitelist-source-range change. I tried to customize error-log-level and it worked as expected. This again tells me the issue is somewhere else than this PR is changing.

ElvinEfendi · 2018-06-21T08:30:09Z

internal/ingress/controller/nginx.go

@@ -311,10 +308,11 @@ func (n *NGINXController) Start() {
 			if evt, ok := event.(store.Event); ok {
 				glog.V(3).Infof("Event %v received - object %v", evt.Type, evt.Obj)
 				if evt.Type == store.ConfigurationEvent {
-					n.SetForceReload(true)
+					n.syncQueue.EnqueueTask(task.GetDummyObject("configmap-change"))


Can you explain why this special case is necessary after you added checksum to the model? Please also refer to https://github.com/kubernetes/ingress-nginx/pull/2656/files#r196736703.

Can you explain why this special case is necessary after you added checksum to the model?

If we don't add this we could skip one update (don't adding an element to the queue)

ElvinEfendi · 2018-06-21T08:33:19Z

internal/ingress/controller/store/store.go

+						key := k8s.MetaNamespaceKey(ingKey)
+						ing, err := store.GetIngress(key)
+						if err != nil {
+							glog.Errorf("could not find Ingress %v in local store", key)


Is "not found" the only error GetIngress returns? Would it be useful to include err message itself in the log as well?

ElvinEfendi · 2018-06-21T09:04:10Z

internal/ingress/controller/store/store.go

@@ -479,6 +479,18 @@ func New(checkOCSP bool,
 					if key == configmap {
 						store.setConfig(cm)
 					}
+
+					ings := store.listers.IngressAnnotation.List()


While this is going to fix the issue, it seems hacky to me. I think the root of the issue is the way IP whitelisting is implemented at

ingress-nginx/internal/ingress/annotations/ipwhitelist/main.go

Line 81 in fe9a5ae

defBackend := a.r.GetDefaultBackend()

This is adding an extra cost that we can avoid by fixing root cause of the issue:

Reading https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/annotations/ipwhitelist/main.go#L80 what it's doing is if whitelist-source-range annotation is not set then we use whitelist-source-range value from the configmap otherwise we use the one from the annotation. IMO that logic should not be part of the annotation parsing - annotation parsing should not be concerned about configmap. I suggest we move that logic into the template, in other words in the template we first check location.Whitelist.CIDR if it is not empty we then use that to generate geo ... config otherwise we use $cfg.WhitelistSourceRange to do so.

I share that opinion, the improvement to the config detection is independent from the issue related to whitelist-source-range, this should be made clearer via a separate PR.

@antoineco my comment was mainly about the fact that this fix is a hack, it is not fixing the root of the issue and unnecessarily introducing extra work on configmap updates.

While this is going to fix the issue, it seems hacky to me. I think the root of the issue is the way IP whitelisting is implemented at

This approach is used in all the annotation parsing step. Changing that in this PR is not going to happen.

IMO that logic should not be part of the annotation parsing - annotation parsing should not be concerned about configmap

You are right here, this should not be part of the annotation parsing but right now we have no alternative. This is the most important reason why I want to move to CRDs and don't use annotations anymore, not only because is complex but it doesn't scale and it's impossible to add the semantics we need in the configuration options (like a list of IP addresses and not a comma-separated string that can contain anything)

I suggest we move that logic into the template

No, that would make that impossible to understand, even now the templating step contains too much logic.

ElvinEfendi · 2018-06-21T09:05:21Z

test/e2e/settings/configmap_change_reload.go

+
+		Expect(checksum).NotTo(BeEquivalentTo(newChecksum))
+	})
+})


loving this e2e test 😍

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 17, 2018

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 17, 2018

aledbf assigned antoineco Jun 17, 2018

aledbf force-pushed the queue branch 3 times, most recently from ff26ed1 to 5ec16e3 Compare June 17, 2018 16:57

ElvinEfendi reviewed Jun 18, 2018

View reviewed changes

aledbf force-pushed the queue branch 2 times, most recently from cec6143 to da6ca01 Compare June 18, 2018 13:31

aledbf assigned ElvinEfendi Jun 18, 2018

ElvinEfendi reviewed Jun 19, 2018

View reviewed changes

aledbf force-pushed the queue branch from 8254641 to b8503dc Compare June 19, 2018 11:54

aledbf force-pushed the queue branch 2 times, most recently from 7bc5b3a to 0ec99f8 Compare June 20, 2018 17:05

ElvinEfendi reviewed Jun 21, 2018

View reviewed changes

aledbf force-pushed the queue branch from 0ec99f8 to 24a5cc6 Compare June 21, 2018 12:48

ElvinEfendi mentioned this pull request Jun 21, 2018

After a configmap change parse ingress annotations (again) #2672

Merged

aledbf added 3 commits June 21, 2018 09:49

Use information about the configuration configmap to determine changes

5ecd293

Add hashstructure dependency

6621960

Rename queue functions

d183c34

aledbf force-pushed the queue branch from 24a5cc6 to 84e8419 Compare June 21, 2018 13:54

Add test for configmap checksum

0abbd2b

aledbf force-pushed the queue branch from 84e8419 to 0abbd2b Compare June 21, 2018 13:59

aledbf merged commit aec40c1 into kubernetes:master Jun 21, 2018

aledbf deleted the queue branch June 21, 2018 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve configuration change detection #2656

Improve configuration change detection #2656

aledbf commented Jun 17, 2018

k8s-ci-robot commented Jun 17, 2018

codecov-io commented Jun 17, 2018 •

edited

Loading

ElvinEfendi commented Jun 18, 2018

ElvinEfendi Jun 18, 2018

aledbf Jun 18, 2018

ElvinEfendi Jun 18, 2018

ElvinEfendi Jun 18, 2018

ElvinEfendi Jun 18, 2018 •

edited

Loading

aledbf Jun 18, 2018

ElvinEfendi left a comment

aledbf commented Jun 18, 2018

ElvinEfendi commented Jun 19, 2018

ElvinEfendi Jun 19, 2018

aledbf Jun 19, 2018

ElvinEfendi Jun 19, 2018

aledbf Jun 19, 2018

ElvinEfendi Jun 20, 2018

aledbf commented Jun 19, 2018

aledbf commented Jun 19, 2018

ElvinEfendi commented Jun 20, 2018

aledbf commented Jun 20, 2018

ElvinEfendi commented Jun 20, 2018 •

edited

Loading

aledbf commented Jun 20, 2018

aledbf commented Jun 20, 2018

ElvinEfendi commented Jun 20, 2018

ElvinEfendi Jun 21, 2018

aledbf Jun 21, 2018

ElvinEfendi Jun 21, 2018

ElvinEfendi Jun 21, 2018

antoineco Jun 21, 2018

ElvinEfendi Jun 21, 2018

aledbf Jun 21, 2018

ElvinEfendi Jun 21, 2018

Improve configuration change detection #2656

Improve configuration change detection #2656

Conversation

aledbf commented Jun 17, 2018

k8s-ci-robot commented Jun 17, 2018

codecov-io commented Jun 17, 2018 • edited Loading

Codecov Report

ElvinEfendi commented Jun 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ElvinEfendi Jun 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ElvinEfendi left a comment

Choose a reason for hiding this comment

aledbf commented Jun 18, 2018

ElvinEfendi commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aledbf commented Jun 19, 2018

aledbf commented Jun 19, 2018

ElvinEfendi commented Jun 20, 2018

aledbf commented Jun 20, 2018

ElvinEfendi commented Jun 20, 2018 • edited Loading

aledbf commented Jun 20, 2018

aledbf commented Jun 20, 2018

ElvinEfendi commented Jun 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jun 17, 2018 •

edited

Loading

ElvinEfendi Jun 18, 2018 •

edited

Loading

ElvinEfendi commented Jun 20, 2018 •

edited

Loading