-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when using config store backed by config maps #1960
Comments
@dprotaso @mattmoor @runzexia do you have any thoughts on what the best approach is to take here? Are you open to the knative/pkg controller code doing something to wait and make sure the configstores are initialized correctly? (This seems like the best solution to me cuz this bug would impact all users of this lib) (some other ideas listed above) |
Friendly ping, I also noticed this in https://github.com/imjasonh/tektoncd-concurrency |
Based on the discussion in the linked issue, I'd change this check to not be based on the informer cache, but based on a positive hand-off signal from the callback: pkg/configmap/informed_watcher.go Lines 175 to 176 in c5296cd
I suspect we could do this just by instrumenting add here: pkg/configmap/informed_watcher.go Line 160 in c5296cd
|
@mattmoor sounds like it makes sense to resolve this in knative/pkg 👍 I'd be happy to make PR but I need a bit more detail about what you're envisioning (sounds like we'd be creating an interface where the callbacks can signal, maybe via a channel, that they're ready?) - lemme know if it makes more sense to schedule some time to discuss or if it makes sense for me to bring up in a knative WG |
IDK that we need to change the interface, since we are responsible for calling each callback. I was just thinking about some internal bookkeeping to check that all of the callbacks had been called...? 🤷 I'm happy to chat, or we can throw it on an API WG agenda. |
When using the informed watcher to watch a config map, previously add events were being processed in a goroutine with no syncrhonization making it so that code may try to access the values backed by the configmaps before they are initialized. This commit makes it so that the Start method of the informer will wait for the add event to occur at least once for all config maps it is watching. This commit also undoes the workaround added in knative#1929 which was working around the race condition identified in knative#1907 (and in tektoncd/pipeline#3720). This means that if the synchronization was removed, the impacted test would start flaking again. If we wanted it to reliably fail in that case, we could introduce a sleep in the callback but that doesn't seem worth it. I also tested this change by manually patching the changes into my clone of tektoncd/pipeline and following the repro steps at tektoncd/pipeline#2815 (comment) Before the change I can reproduce the issue, and after the change, I can't! :D Fixes knative#1960
When using the informed watcher to watch a config map, previously add events were being processed in a goroutine with no syncrhonization making it so that code may try to access the values backed by the configmaps before they are initialized. This commit makes it so that the Start method of the informer will wait for the add event to occur at least once for all config maps it is watching. This commit also undoes the workaround added in knative#1929 which was working around the race condition identified in knative#1907 (and in tektoncd/pipeline#3720). This means that if the synchronization was removed, the impacted test would start flaking again. If we wanted it to reliably fail in that case, we could introduce a sleep in the callback but that doesn't seem worth it. I also tested this change by manually patching the changes into my clone of tektoncd/pipeline and following the repro steps at tektoncd/pipeline#2815 (comment) Before the change I can reproduce the issue, and after the change, I can't! :D Fixes knative#1960
When using the informed watcher to watch a config map, previously add events were being processed in a goroutine with no syncrhonization making it so that code may try to access the values backed by the configmaps before they are initialized. This commit makes it so that the Start method of the informer will wait for the add event to occur at least once for all config maps it is watching. This commit also undoes the workaround added in knative#1929 which was working around the race condition identified in knative#1907 (and in tektoncd/pipeline#3720). This means that if the synchronization was removed, the impacted test would start flaking again. If we wanted it to reliably fail in that case, we could introduce a sleep in the callback but that doesn't seem worth it. I also tested this change by manually patching the changes into my clone of tektoncd/pipeline and following the repro steps at tektoncd/pipeline#2815 (comment) Before the change I can reproduce the issue, and after the change, I can't! :D Fixes knative#1960
When using the informed watcher to watch a config map, previously add events were being processed in a goroutine with no syncrhonization making it so that code may try to access the values backed by the configmaps before they are initialized. This commit makes it so that the Start method of the informer will wait for the add event to occur at least once for all config maps it is watching. This commit also undoes the workaround added in knative#1929 which was working around the race condition identified in knative#1907 (and in tektoncd/pipeline#3720). This means that if the synchronization was removed, the impacted test would start flaking again. If we wanted it to reliably fail in that case, we could introduce a sleep in the callback but that doesn't seem worth it. I also tested this change by manually patching the changes into my clone of tektoncd/pipeline and following the repro steps at tektoncd/pipeline#2815 (comment) Before the change I can reproduce the issue, and after the change, I can't! :D Fixes knative#1960
When using the informed watcher to watch a config map, previously add events were being processed in a goroutine with no syncrhonization making it so that code may try to access the values backed by the configmaps before they are initialized. This commit makes it so that the Start method of the informer will wait for the add event to occur at least once for all config maps it is watching. This commit also undoes the workaround added in knative#1929 which was working around the race condition identified in knative#1907 (and in tektoncd/pipeline#3720). This means that if the synchronization was removed, the impacted test would start flaking again. If we wanted it to reliably fail in that case, we could introduce a sleep in the callback but that doesn't seem worth it. I also tested this change by manually patching the changes into my clone of tektoncd/pipeline and following the repro steps at tektoncd/pipeline#2815 (comment) Before the change I can reproduce the issue, and after the change, I can't! :D Fixes knative#1960
* Fix race: Make informed watcher start wait for Add event 🏎️ When using the informed watcher to watch a config map, previously add events were being processed in a goroutine with no syncrhonization making it so that code may try to access the values backed by the configmaps before they are initialized. This commit makes it so that the Start method of the informer will wait for the add event to occur at least once for all config maps it is watching. This commit also undoes the workaround added in #1929 which was working around the race condition identified in #1907 (and in tektoncd/pipeline#3720). This means that if the synchronization was removed, the impacted test would start flaking again. If we wanted it to reliably fail in that case, we could introduce a sleep in the callback but that doesn't seem worth it. I also tested this change by manually patching the changes into my clone of tektoncd/pipeline and following the repro steps at tektoncd/pipeline#2815 (comment) Before the change I can reproduce the issue, and after the change, I can't! :D Fixes #1960 * Make synced callback and named wait group private 🕵️ These utility objects don't really make sense to expose as part of the informed watcher package and are only used by the informed watcher. Writing tests for unexported code makes me a bit :( but hopefully these will get moved to some other package one day. And there's really no reason to expose these to users of knative/pkg at the moment.
Expected Behavior
When I try to use a config store backed by a config map immediately after the process using it starts up, when the config map exists before the process starts, I'd like to be able to guarantee that I am reading the configured value.
I think we could do this if one of the following was true:
/kind bug
Actual Behavior
When I use a generated reconciler, which has an unexported configstore, if I try to back that config store with configmaps, there is a race condition where if I try to use that config store immediately after start up, the informed watcher's goroutines which read from the backing config maps and update the config store may not have fired yet, which means I might read something from the config store other than what I've configured.
Steps to Reproduce the Problem
Additional Info
The text was updated successfully, but these errors were encountered: