-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite loop while watching for Kubernetes events #6503
Comments
I've pushed a branch ignoring unmarshall errors: https://github.com/exekias/beats/tree/ignore-k8s-errors Also built an image from it: |
@exekias Looks good, no problems till now. |
Good to hear, @simioa, I'm wondering, did you see the ERROR happen? The code doesn't really avoid that message but should recover from it in all cases: |
Not that ERROR Message but this:
I just looked into an Filebeat 6.2.2 Log and saw that right after the This is not the case anymore, the only thing logged after the EOF Error related to the Watching API is:
|
Ok, please keep an eye on it and report if it happens 😇 In the meanwhile, it seems we now have a way to reproduce it, I will be doing some testing myself Thank you for your effort! It's really helping with this issue |
A fix has been merged and should go out in both |
…6530) Cherry-pick of PR #6504 to 6.2 branch. Original message: This PR fixes #6503 How to reproduce: Run filebeat pointing to minikube. ``` minikube ssh sudo su ps aux | grep localkube kill -9 process_id ``` This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure: ``` type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" > typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:"" <nil> ``` In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by `client-go`. The reason why the code fails with error is because we pass a `Pod` resource to do the `watcher.Next()` in this scenario the resource that is attempted to be parsed is an `Error` resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly. This fix is not the best in the world as it might miss few state changes.
what's the ETA for new container 6.2.3 with the fix merged ? |
It should be out soon, in any case, for convenience, I've pushed a snapshot build with the fix: |
An update on this, new images for |
@exekias this error also appears in
|
Still appears in 6.3.2:
|
@hobti01 Do you have memory issue with
|
@yu-yang2 we do not have memory issues. We have noticed that sometimes log shipping will stop for an application container when we update filebeat and is not restarted. This is the only ERROR in our logs, but this seems to be just a warning about the Kubernetes watch which is automatically restarted. |
Same here filebeat version
|
Same error in filebeat v6.4.2
|
I also got this error in v6.4.3.
|
I also have the same errors with v6.4.3 my filebeat memory usage grows until it's killed by kubernetes at about 1gb of usage per pod. |
same error for v6.4.3: |
Can we have this re-opened since the issues seems to still be occurring? |
@opsnull Do you also see filebeat consuming large amounts of memory? |
I have opened a new ticket here |
Have this issue in filebeat-oss:6.8.0
|
Have the same issue in 6.8.1 |
I'm also seeing this on 6.8.1 Repeated output of Some pods on the machine do not have missed logs, while others do. We're seeing this issue in development and production clusters. |
Have the same issue in metricbeat 7.3.1 . kubernetes module config
log:
|
We also are seeing this error again in both our aws and on-prem environments on filebeat 6.8.1:
Many sets of logs are not being shipped as well. |
This error is reproducible with filebeat 6.8.9:
We're missing some containers logs when this error occurs. |
@exekias This is still seen on later versions. |
…atch (elastic#6530) Cherry-pick of PR elastic#6504 to 6.2 branch. Original message: This PR fixes elastic#6503 How to reproduce: Run filebeat pointing to minikube. ``` minikube ssh sudo su ps aux | grep localkube kill -9 process_id ``` This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure: ``` type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" > typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:"" <nil> ``` In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by `client-go`. The reason why the code fails with error is because we pass a `Pod` resource to do the `watcher.Next()` in this scenario the resource that is attempted to be parsed is an `Error` resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly. This fix is not the best in the world as it might miss few state changes.
Both when using
add_kubernetes_metadata
orkubernetes
autodiscover provider, some users have reported this issue (which causes an infinite loop):More details can be found here: #6353
The text was updated successfully, but these errors were encountered: