-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: controller watches global configMap in the namespace where it is running and not in managedNamespace - fixes #11463 #11799
Conversation
… running and not in managedNamespace Signed-off-by: toblich <toblich@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix (and all the details here and in the issue too)! As commented in the issue, this may also potentially fix some related issues as well, if this were their root cause.
This is a one-line fix, so I'm ok approving as is with manual verification.
I did verify the codepaths myself that this is logically correct:
config
controller creation usesnamespace
and notmanagedNamespace
UpdateConfig
callsGet
on theconfig
controller, which usesnamespace
.- Both of those explain why it still works on initialization, but fails to
watch
correctly
So WatchFunc
here is the odd one out. And ofc, as stated in PR and issue, the workflow-controller-configmap
should be in the same namespace as the Workflow Controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I do see a potential problem that I suspected in the issue...
On line 422 below in this same function, wfc.notifySemaphoreConfigUpdate
is called for every ConfigMap in the namespace. Semaphore configs are indeed in the managedNamespace
.
This may require two separate watchers if there isn't a separate watcher on semaphores already
So there is a separate watcher. The That informer should probably be extended to semaphores then if I'm understanding correctly. Would definitely be good to get someone more familiar with the ConfigMap codepaths. @sarabala1979 wrote some of this code originally in #4421, which is why I requested his review. |
@juliev0 not quite, that PR doesn't actually resolve this issue at all. seems like the author there did not interpret the bug here correctly. See my comment there: #11855 (comment) |
yes, you're totally right. Not sure how I missed that :| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, holding off on approval until maybe we can get some feedback on Anton's questions
After reviewing #11855 in-depth, I am pretty sure that As far as I understand, that function is responsible for requeueing if a semaphore's ConfigMap value increases in size (say you had a limit of 5 and now allow 10, for example). |
Are you thinking that instead of using a |
ostensibly, it would be a good idea to re-use an existing Informer such as the I was also concerned in #11855 (comment) that the I can tackle that if our two contributors here aren't able to pick it up. |
Nice deep diving into all of this. Yes, you seem to have become an expert on this. |
Fixes #11463
Motivation
There was a bug by which the workflow controller did not pick up changes made to its configMap until its pod was deleted. This only manifestated when the controller is running in a particular namespace (
--namespaced
) but is managing workflows in a different namespace (--managed-namespace=<some_other_ns>
)Modifications
Fixed the namespace that the controller watches for changes made to its own configMap, so that it watches the same namespace that it initially reads. See the analysis in #11463 for more details.
Verification
I did manual verification by:
--namespaced
--managed-namespace=<other_ns>
)I also ran all tests, but given that this bug is related to configmaps and namespaces in a k8s cluster, and I don't see a clear way of simulating that in
workflow/controller/controller_test.go
, I haven't added any new tests, but I'm obviously open to do it if there's any example of how to mock the namespaces and configmaps to actually test the watcher (which, as far as I can tell, relies on k8s control plane actually being there and is thus complicated to test without a real k8s cluster).