Infinite loop while watching for Kubernetes events #6503

exekias · 2018-03-07T02:28:26Z

Both when using add_kubernetes_metadata or kubernetes autodiscover provider, some users have reported this issue (which causes an infinite loop):

2018-03-05T08:16:36.021Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error EOF
2018-03-05T08:16:36.021Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.030Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.030Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.031Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.031Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.033Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

More details can be found here: #6353

The text was updated successfully, but these errors were encountered:

exekias · 2018-03-07T02:56:52Z

I've pushed a branch ignoring unmarshall errors: https://github.com/exekias/beats/tree/ignore-k8s-errors

Also built an image from it: exekias/filebeat:ignore-k8s-errors, @simioa it would be awesome if you could do another test, your help here is much appreciated

simioa · 2018-03-09T08:06:15Z

@exekias Looks good, no problems till now.

exekias · 2018-03-09T09:14:19Z

Good to hear, @simioa, I'm wondering, did you see the ERROR happen?

The code doesn't really avoid that message but should recover from it in all cases:
kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

simioa · 2018-03-09T09:29:33Z

Not that ERROR Message but this:

2018-03-08T15:33:50.114Z        ERROR   kubernetes/watcher.go:198       kubernetes: Watching API error EOF

I just looked into an Filebeat 6.2.2 Log and saw that right after the kubernetes: Watching API error EOF ERROR appeared, the infinite loop started.

This is not the case anymore, the only thing logged after the EOF Error related to the Watching API is:

2018-03-08T15:33:50.114Z        INFO    kubernetes/watcher.go:181       kubernetes: Watching API for resource events

exekias · 2018-03-09T09:34:03Z

Ok, please keep an eye on it and report if it happens 😇

In the meanwhile, it seems we now have a way to reproduce it, I will be doing some testing myself

Thank you for your effort! It's really helping with this issue

exekias · 2018-03-12T10:12:54Z

A fix has been merged and should go out in both 6.3.0 and 6.2.3

…6530) Cherry-pick of PR #6504 to 6.2 branch. Original message: This PR fixes #6503 How to reproduce: Run filebeat pointing to minikube. ``` minikube ssh sudo su ps aux | grep localkube kill -9 process_id ``` This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure: ``` type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" > typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:"" <nil> ``` In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by `client-go`. The reason why the code fails with error is because we pass a `Pod` resource to do the `watcher.Next()` in this scenario the resource that is attempted to be parsed is an `Error` resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly. This fix is not the best in the world as it might miss few state changes.

primeroz · 2018-03-13T15:16:17Z

what's the ETA for new container 6.2.3 with the fix merged ?

exekias · 2018-03-13T15:19:36Z

It should be out soon, in any case, for convenience, I've pushed a snapshot build with the fix: exekias/filebeat:6.2.3-snapshot

exekias · 2018-03-20T19:08:08Z

An update on this, new images for 6.2.3 are already available: https://www.docker.elastic.co/

yangyuw · 2018-08-19T02:05:08Z

@exekias this error also appears in 6.4.0, kubernetes 1.6.4 , any workaround for this?

2018-08-19T01:53:44.655Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-08-19T01:53:44.655Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events
2018-08-19T01:53:53.051Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-08-19T01:53:53.051Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events
2018-08-19T01:53:53.293Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-08-19T01:53:53.294Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events
2018-08-19T01:53:54.065Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-08-19T01:53:54.065Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events
2018-08-19T01:54:05.405Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-08-19T01:54:05.405Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events
2018-08-19T01:54:07.414Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-08-19T01:54:07.414Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events

hobti01 · 2018-08-26T12:14:17Z

Still appears in 6.3.2:

2018-08-25T14:50:05.364Z        INFO    [beat]  instance/beat.go:725    Build info      {"system_info": {"build": {"commit": "45a9a9e1561b6c540e94211ebe03d18abcacae55", "libbeat": "6.3.2", "time": "2018-07-20T04:18:19.000Z", "version": "6.3.2"}}}
...
2018-08-25T18:40:57.180Z        ERROR   kubernetes/watcher.go:195       kubernetes: Watching API error EOF
2018-08-25T18:40:57.180Z        INFO    kubernetes/watcher.go:179       kubernetes: Watching API for resource events

yangyuw · 2018-08-26T14:37:39Z

@hobti01 Do you have memory issue with filebeat process? With the error appear more and more frequently, the memory useage may grow to 5GB and more.In my cluster, this error appears after filebeat restart about 1 hour.
I found the root cause maybe this line, w.r return empty string, which is abnormal.

beats/vendor/github.com/ericchiang/k8s/watch.go

Line 100 in 0b32c2d

if _, err := io.ReadFull(w.r, length); err != nil {

hobti01 · 2018-08-27T05:52:25Z

@yu-yang2 we do not have memory issues. We have noticed that sometimes log shipping will stop for an application container when we update filebeat and is not restarted.

This is the only ERROR in our logs, but this seems to be just a warning about the Kubernetes watch which is automatically restarted.

agoloAbbas · 2018-09-05T16:59:58Z

Same here filebeat version 6.3.2

2018-09-05T04:47:25.540Z	ERROR	kubernetes/watcher.go:195	kubernetes: Watching API error EOF
2018-09-05T04:47:25.541Z	INFO	kubernetes/watcher.go:179	kubernetes: Watching API for resource events```

mslga · 2018-11-09T09:04:41Z

Same error in filebeat v6.4.2
Installed from https://github.com/helm/charts/tree/master/stable/filebeat

2018-11-09T15:03:07.979+0600	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF

thspinto · 2018-11-13T12:27:04Z

I also got this error in v6.4.3.
The pod with this error stops collecting kubernetes' metadata. When I restart the pod, the metadata collection resumes, but appears again after a few hours.

2018-11-13T12:23:44.563Z	ERROR	kubernetes/watcher.go:254	kubernetes: Watching API error EOF
2018-11-13T12:23:44.563Z	INFO	kubernetes/watcher.go:238	kubernetes: Watching API for resource events

bwunderlich824 · 2018-11-13T17:25:23Z

I also have the same errors with v6.4.3 my filebeat memory usage grows until it's killed by kubernetes at about 1gb of usage per pod.

opsnull · 2018-11-14T03:10:58Z

same error for v6.4.3:
2018-11-13T21:11:43.472+0800 ERROR kubernetes/watcher.go:254 kubernetes: Watching API error EOF
2018-11-13T21:11:43.472+0800 INFO kubernetes/watcher.go:238 kubernetes: Watching API for resource events
2018-11-13T21:11:43.474+0800 ERROR kubernetes/watcher.go:254 kubernetes: Watching API error proto: illegal wireType 6
2018-11-13T21:11:43.474+0800 INFO kubernetes/watcher.go:258 kubernetes: Ignoring event, moving to most recent resource version
2018-11-13T21:11:43.474+0800 INFO kubernetes/watcher.go:238 kubernetes: Watching API for resource events

bwunderlich824 · 2018-11-14T16:48:32Z

Can we have this re-opened since the issues seems to still be occurring?

bwunderlich824 · 2018-11-14T16:49:16Z

@opsnull Do you also see filebeat consuming large amounts of memory?

bwunderlich824 · 2018-11-14T16:57:12Z

I have opened a new ticket here
#9078

WuChenLinux · 2019-03-16T06:30:03Z

6.6.1也有这个错误

Jasstkn · 2019-06-18T13:10:31Z

Have this issue in filebeat-oss:6.8.0

2019-06-18T13:04:47.479Z	ERROR	kubernetes/watcher.go:258	kubernetes: Watching API error EOF
2019-06-18T13:04:47.479Z	INFO	kubernetes/watcher.go:242	kubernetes: Watching API for resource events
2019-06-18T13:05:05.263Z	INFO	[monitoring]	log/log.go:144	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":470,"time":{"ms":2}},"total":{"ticks":2840,"time":{"ms":18},"value":2840},"user":{"ticks":2370,"time":{"ms":16}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":17},"info":{"ephemeral_id":"b4b54edd-c52c-4b5a-a29b-512f3f10c76e","uptime":{"ms":3360067}},"memstats":{"gc_next":52123232,"memory_alloc":34918008,"memory_total":303615072}},"filebeat":{"harvester":{"open_files":8,"running":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":900,"active":-573,"batches":1,"total":327},"read":{"bytes":42},"write":{"bytes":33899}},"pipeline":{"clients":2,"events":{"active":4118,"retry":327}}},"registrar":{"states":{"current":35}},"system":{"load":{"1":0.09,"15":0.1,"5":0.12,"norm":{"1":0.045,"15":0.05,"5":0.06}}}}}}

illia-sh · 2019-07-23T10:49:53Z

Have the same issue in 6.8.1

justinwalz · 2019-09-04T19:05:11Z

I'm also seeing this on 6.8.1

Repeated output of 2019-09-04T01:17:29.852Z ERROR kubernetes/watcher.go:258 kubernetes: Watching API error EOF coupled with dropped logs.

Some pods on the machine do not have missed logs, while others do. We're seeing this issue in development and production clusters.

ZhaoBaymax · 2019-09-05T07:14:44Z

Have the same issue in metricbeat 7.3.1 .
but this issue happen in alibaba cloud Container Service
in aws cloud had not happend

kubernetes module config

 79   kubernetes.yml: |-
 80     - module: kubernetes
 81       metricsets:
 82         - event
 83       period: 10s
 84       host: ${NODE_NAME}
 85       # If using Red Hat OpenShift remove the previous hosts entry and 
 86       # uncomment these settings:
 87       #hosts: ["https://${HOSTNAME}:10250"]
 88       #bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 89       #ssl.certificate_authorities:
 90       #  - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 91       #ssl.verification_mode: "none"
 92       add_metadata: true
 93       in_cluster: true
 94       fields_under_root: true
 95       fields:
 96         environment: ${ENVIRONMENT}
 97         REGION: ${REGION}

log:

2019-09-05T06:29:12.239Z	DEBUG	[cfgfile]	cfgfile/reload.go:197	Scan for new config files
2019-09-05T06:29:12.239Z	DEBUG	[cfgfile]	cfgfile/cfgfile.go:175	Load config from file: /usr/share/metricbeat/modules.d/kubernetes.yml
2019-09-05T06:29:12.239Z	DEBUG	[cfgfile]	cfgfile/reload.go:216	Number of module configs found: 1
2019-09-05T06:29:12.239Z	DEBUG	[reload]	cfgfile/list.go:62	Starting reload procedure, current runners: 1
2019-09-05T06:29:12.239Z	DEBUG	[reload]	cfgfile/list.go:80	Start list: 0, Stop list: 0
2019-09-05T06:29:20.456Z	DEBUG	[kubernetes]	kubernetes/watcher.go:261	EOF while watching API
2019-09-05T06:29:20.456Z	INFO	kubernetes/watcher.go:242	kubernetes: Watching API for resource events
2019-09-05T06:29:20.479Z	DEBUG	[kubernetes]	kubernetes/watcher.go:261	EOF while watching API
2019-09-05T06:29:20.479Z	INFO	kubernetes/watcher.go:242	kubernetes: Watching API for resource events
2019-09-05T06:29:20.579Z	DEBUG	[kubernetes]	kubernetes/watcher.go:261	EOF while watching API

brokencode64 · 2019-09-26T10:17:20Z

We also are seeing this error again in both our aws and on-prem environments on filebeat 6.8.1:

2019-09-26T08:58:13.420Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error EOF
2019-09-26T09:51:33.050Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error EOF
2019-09-26T09:51:33.054Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

Many sets of logs are not being shipped as well.

rbailly-talend · 2020-06-19T12:57:02Z

This error is reproducible with filebeat 6.8.9:

{"level":"error","timestamp":"2020-06-19T11:37:51.250Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}
{"level":"error","timestamp":"2020-06-19T11:37:51.251Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName"}
{"level":"error","timestamp":"2020-06-19T12:24:33.883Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}

We're missing some containers logs when this error occurs.

thernstig · 2020-09-25T07:05:36Z

@exekias This is still seen on later versions.

…atch (elastic#6530) Cherry-pick of PR elastic#6504 to 6.2 branch. Original message: This PR fixes elastic#6503 How to reproduce: Run filebeat pointing to minikube. ``` minikube ssh sudo su ps aux | grep localkube kill -9 process_id ``` This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure: ``` type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" > typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:"" <nil> ``` In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by `client-go`. The reason why the code fails with error is because we pass a `Pod` resource to do the `watcher.Next()` in this scenario the resource that is attempted to be parsed is an `Error` resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly. This fix is not the best in the world as it might miss few state changes.

exekias added bug libbeat labels Mar 7, 2018

exekias mentioned this issue Mar 7, 2018

Avoid infinite loop in Kubernetes watcher #6353

Merged

vjsamuel mentioned this issue Mar 7, 2018

Fix infinite failure on Kubernetes watch #6504

Merged

exekias closed this as completed in #6504 Mar 12, 2018

exekias mentioned this issue Mar 12, 2018

Cherry-pick #6504 to 6.2: Fix infinite failure on Kubernetes watch #6530

Merged

exekias mentioned this issue Mar 12, 2018

Filebeat 6.2.1 massive DOS on Kubernetes Apiserver #6536

Closed

bwunderlich824 mentioned this issue Nov 14, 2018

filebeat error and excessive memory usage with autodiscover #9078

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop while watching for Kubernetes events #6503

Infinite loop while watching for Kubernetes events #6503

exekias commented Mar 7, 2018

exekias commented Mar 7, 2018

simioa commented Mar 9, 2018

exekias commented Mar 9, 2018

simioa commented Mar 9, 2018

exekias commented Mar 9, 2018

exekias commented Mar 12, 2018 •

edited

Loading

primeroz commented Mar 13, 2018

exekias commented Mar 13, 2018 •

edited

Loading

exekias commented Mar 20, 2018

yangyuw commented Aug 19, 2018

hobti01 commented Aug 26, 2018

yangyuw commented Aug 26, 2018

hobti01 commented Aug 27, 2018

agoloAbbas commented Sep 5, 2018

mslga commented Nov 9, 2018

thspinto commented Nov 13, 2018 •

edited

Loading

bwunderlich824 commented Nov 13, 2018

opsnull commented Nov 14, 2018

bwunderlich824 commented Nov 14, 2018

bwunderlich824 commented Nov 14, 2018

bwunderlich824 commented Nov 14, 2018

WuChenLinux commented Mar 16, 2019

Jasstkn commented Jun 18, 2019

illia-sh commented Jul 23, 2019

justinwalz commented Sep 4, 2019

ZhaoBaymax commented Sep 5, 2019

brokencode64 commented Sep 26, 2019

rbailly-talend commented Jun 19, 2020

thernstig commented Sep 25, 2020

Infinite loop while watching for Kubernetes events #6503

Infinite loop while watching for Kubernetes events #6503

Comments

exekias commented Mar 7, 2018

exekias commented Mar 7, 2018

simioa commented Mar 9, 2018

exekias commented Mar 9, 2018

simioa commented Mar 9, 2018

exekias commented Mar 9, 2018

exekias commented Mar 12, 2018 • edited Loading

primeroz commented Mar 13, 2018

exekias commented Mar 13, 2018 • edited Loading

exekias commented Mar 20, 2018

yangyuw commented Aug 19, 2018

hobti01 commented Aug 26, 2018

yangyuw commented Aug 26, 2018

hobti01 commented Aug 27, 2018

agoloAbbas commented Sep 5, 2018

mslga commented Nov 9, 2018

thspinto commented Nov 13, 2018 • edited Loading

bwunderlich824 commented Nov 13, 2018

opsnull commented Nov 14, 2018

bwunderlich824 commented Nov 14, 2018

bwunderlich824 commented Nov 14, 2018

bwunderlich824 commented Nov 14, 2018

WuChenLinux commented Mar 16, 2019

Jasstkn commented Jun 18, 2019

illia-sh commented Jul 23, 2019

justinwalz commented Sep 4, 2019

ZhaoBaymax commented Sep 5, 2019

brokencode64 commented Sep 26, 2019

rbailly-talend commented Jun 19, 2020

thernstig commented Sep 25, 2020

exekias commented Mar 12, 2018 •

edited

Loading

exekias commented Mar 13, 2018 •

edited

Loading

thspinto commented Nov 13, 2018 •

edited

Loading