-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset watch retry count on successful connection to API Server #267
Reset watch retry count on successful connection to API Server #267
Conversation
@qingling128 FYI. |
Thanks a lot for the reviews folks. Can we have this merged and released as new version? |
Hey @jcantrill is there anything stopping us from merging and releasing a new version? Is there anything I could help with? |
@jcantrill our production fluentd instances are keeping restarting all the time, can we merge this and release a new version ASAP? |
Replicates changes from this PR on upstream: fabric8io/fluent-plugin-kubernetes_metadata_filter#267
Replicates changes from this PR on upstream: fabric8io/fluent-plugin-kubernetes_metadata_filter#267
Replicates changes from this PR on upstream: fabric8io/fluent-plugin-kubernetes_metadata_filter#267
…1183) This change bundles the `kubernetes_metadata` Fluentd plugin from https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/ into our repo, as the upstream is not active and our [pull request there](fabric8io/fluent-plugin-kubernetes_metadata_filter#267) is not being merged despite being accepted. This code was copied from upstream at commit fabric8io/fluent-plugin-kubernetes_metadata_filter@84f66a8 on `master` branch from Oct 8th, 2020. On that commit, changes from the pull request fabric8io/fluent-plugin-kubernetes_metadata_filter#267 have been applied.
…1183) This change bundles the `kubernetes_metadata` Fluentd plugin from https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/ into our repo, as the upstream is not active and our [pull request there](fabric8io/fluent-plugin-kubernetes_metadata_filter#267) is not being merged despite being accepted. This code was copied from upstream at commit fabric8io/fluent-plugin-kubernetes_metadata_filter@84f66a8 on `master` branch from Oct 8th, 2020. On that commit, changes from the pull request fabric8io/fluent-plugin-kubernetes_metadata_filter#267 have been applied.
@jcantrill @grosser @qingling128 - Is there any ETA on when this fix will be merged and a new release containing the fix available? We have numerous customers affected by this and waiting on this fix which is why @astencel-sumo helped address this. |
…1183) (#1193) This change bundles the `kubernetes_metadata` Fluentd plugin from https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/ into our repo, as the upstream is not active and our [pull request there](fabric8io/fluent-plugin-kubernetes_metadata_filter#267) is not being merged despite being accepted. This code was copied from upstream at commit fabric8io/fluent-plugin-kubernetes_metadata_filter@84f66a8 on `master` branch from Oct 8th, 2020. On that commit, changes from the pull request fabric8io/fluent-plugin-kubernetes_metadata_filter#267 have been applied.
Any progress on this? |
@jcantrill @grosser @qingling128 please merge it and release ASAP, it's really a annoying issue. |
I can't merge/am not a maintainer 🤷 |
Same here. I think @jcantrill has merge access. |
Sorry for the delay here. I will release something shortly |
Fixes #249.
This pull request adds resetting of pod and namespace watch retry count after successfully re-establishing connection to Kubernetes API server. This is to prevent Fluentd restarts in the following scenario (describing pod watch, but namespace watch works in the same way):
:watch_retry_max_times
with no watch updates coming from API server in the meantime, which would reset the watch retry count.Nothing incorrect is actually happening in the above scenario, so we don't want to raise a
Fluent::UnrecoverableError
in such case (which causes the whole Fluentd instance to restart). To prevent raising theFluent::UnrecoverableError
, I propose to reset the watch retry count not only on receiving an update from the watch (which might not happen given for example no changes in the namespaces of the k8s cluster for a long time), but also on successful re-connection to the API server.