Avoid infinite loop in Kubernetes watcher #6353

exekias · 2018-02-12T11:31:20Z

In case of unmarshalling errors the Watcher was ending up in a infinite loop,
this code ignores those errors and only restarts the watcher in case of
EOF.

This was reproduced by @cdahlqvist:

2018-02-09T16:37:46.072Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-09T16:37:46.072Z	INFO	kubernetes/watcher.go:133	kubernetes: Watching API for pod events
2018-02-09T16:37:46.130Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-09T16:37:46.131Z	INFO	kubernetes/watcher.go:133	kubernetes: Watching API for pod events
2018-02-09T16:37:46.131Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-09T16:37:46.131Z	INFO	 kubernetes/watcher.go:133	kubernetes: Watching API for pod events

Opening directly to 6.2 as master is not affected, after its refactoring and client update

In case of unmarshalling errors the Watcher was ending up in a infinite loop, this code ignores those errors and only restarts the watcher in case of EOF.

vjsamuel · 2018-02-12T22:17:10Z

libbeat/common/kubernetes/watcher.go

+				// In case of EOF, stop watching and restart the process
+				if err == io.EOF || err == io.ErrUnexpectedEOF {
+					watcher.Close()
+					time.Sleep(time.Second)


can you please make this binary back off?

exekias · 2018-02-13T22:02:03Z

failing tests are unrelated to this PR

jimsheldon · 2018-02-21T00:32:37Z

We updated to docker.elastic.co/beats/filebeat:6.2.2 hoping it would fix this issue but it is still occurring, is there any chance that container does not have this fix?

exekias · 2018-02-21T00:39:30Z

Uhm, could you paste the log output? Some things should have, at least, changed

jimsheldon · 2018-02-21T01:05:39Z

Sure:

2018-02-21T00:22:57.818Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.819Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.819Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.819Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.820Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.820Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.820Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.821Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.821Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.821Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.822Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.822Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.822Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.823Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.823Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.823Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.824Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.824Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.824Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.825Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.825Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.825Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.826Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.827Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.827Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.827Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events

Thanks for looking into it!

Edit: I see this addition in this pull request:

const maxBackoff = 30 * time.Second

However, there doesn't appear to be any wait between failures.

exekias · 2018-02-21T12:07:51Z

This is weird, it looks like the change is in, but watcher connection backoff doesn't seem in place. Do you know by a chance what's the event originating this? I haven't reproduced it myself

jimsheldon · 2018-02-21T17:39:43Z

So far I haven't been able to find a specific suspicious event right before the loop occurs, do you have any suggestions for what I could look for?

willemdh · 2018-02-26T14:38:32Z

Hello,

We are experiencing this very same issue on Filebeat 6.2.2 when the 'add_kubernetes_metadata' processor is added on our Openshift Kubernetes nodes.

Grtz

ruflin · 2018-02-27T00:49:13Z

@jimsheldon @willemdh Any chance both of you could enable debug logging and share the log output here (or link to a gist)?

walktall · 2018-02-27T00:55:32Z

I once met t this before, that's what I thought client-go would make help here with better compatibility.

…

________________________________ From: Nicolas Ruflin <notifications@github.com> Sent: Tuesday, February 27, 2018 8:49:22 AM To: elastic/beats Cc: Subscribed Subject: Re: [elastic/beats] Avoid infinite loop in Kubernetes watcher (#6353) @jimsheldon<https://github.com/jimsheldon> @willemdh<https://github.com/willemdh> Any chance both of you could enable debug logging and share the log output here (or link to a gist)? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#6353 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AK0p_rs_j-Bz7vwSGZ70Ff53Bgp29YTDks5tY1ESgaJpZM4SCCFd>.

walktall · 2018-02-27T07:11:15Z

I once make a issue for this ericchiang/k8s#68 before while using beat, but didn't get response for this, while when I test #6117 with client-go, it didn't happen again.

willemdh · 2018-02-27T07:33:54Z

@ruflin Well it seems after restarting the FIlebeat agents on our 10 Openshift nodes, the problem did not reappear.. I'm monitoring closely to find a pattern in the hope to reproduce.

ruflin · 2018-02-28T08:04:48Z

@willemdh Keep us posted if it happens again. I think a point that I only realised now is that you are running OpenShift. There were some subtle differences in the past, so this is probably also a good data point to add to the equation when we see this again.

ghost · 2018-02-28T19:16:53Z

We are facing the same Issues with Filebeat 6.2.2 using the add_kubernetes_metadata processor.
Unfortunately redeploying the Filebeat agents in our Kubernetes Cluster did not work.

2018-02-28T19:15:37.823Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.826Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.828Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.828Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.848Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.851Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.851Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.856Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.857Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.858Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.867Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.867Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.867Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.869Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.870Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.870Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.876Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.876Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.876Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.876Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.878Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.878Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.887Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.888Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.888Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.888Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.888Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.888Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.897Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.897Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.897Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.909Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.909Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.909Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.913Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.914Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.914Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.918Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.920Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.920Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.924Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.924Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.924Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.932Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.933Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.933Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events

exekias · 2018-02-28T19:35:14Z

@walktall I'm wondering, did you reproduce this again after updating the client library and refactoring (#6159)?

walktall · 2018-03-01T01:50:16Z

@exekias no. So maybe the updated client library make help here?

exekias · 2018-03-01T03:10:15Z

That was my expectation, would anyone suffering from this issue be up to give a try to master if I build an image?

simioa · 2018-03-01T09:58:20Z

@exekias I would give it a try

wflyer · 2018-03-02T03:35:16Z

I'm experiencing the same issue in custom build image from master(5fa9be4).

2018-03-01T19:43:54.203Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.210Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.210Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.212Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.212Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.213Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.214Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.215Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.215Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.216Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.216Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.218Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

Also, I'm not sure if it caused crash, but the last logs of the previous instance of restarted pods end with this error.

vjsamuel · 2018-03-02T05:46:44Z

I might not have entire context about the issue. I am guessing that this is because of an incompatible kube client as the issue seems to be coming from proto unmarshalling.

@wflyer what is the version of Kubernetes that you're running on?

wflyer · 2018-03-02T06:27:10Z

@vjsamuel I'm using kubernetes version v1.9.2.

Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

wflyer · 2018-03-04T07:55:15Z

For the crash part that I mentioned (#6353 (comment)) it looks like that this issue significantly increases the memory usage of the pod and makes the pod to be killed due to OOM.

exekias · 2018-03-04T14:50:33Z

Thank you for the patience everyone, I've built a 6.3 snapshot: exekias/filebeat:6.3-snapshot, @simioa (and anyone interested) please let me know if it fixes this issue for you

simioa · 2018-03-05T09:06:04Z

@exekias
Unfortunately the issue still persists.
I just attached the log extract of the first occurrence of the Issue.

2018-03-05T08:14:43.736Z        INFO    log/harvester.go:241    File is inactive: /var/lib/docker/containers/3c0e0eb018b0402efd611c270dba466d3860b35372fc649a5404f9f70358df56/3c0e0eb018b0402efd611c270dba46
6d3860b35372fc649a5404f9f70358df56-json.log. Closing because close_inactive of 5m0s reached.
2018-03-05T08:14:48.816Z        INFO    log/harvester.go:216    Harvester started for file: /var/lib/docker/containers/e8ca48b2305214f3777cd780a8dfef92de5f98154093b595733418113b3b681b/e8ca48b2305214f3777c
d780a8dfef92de5f98154093b595733418113b3b681b-json.log
2018-03-05T08:14:48.863Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6000,"time":6002},"total":{"tic
ks":115770,"time":115781,"value":115770},"user":{"ticks":109770,"time":109779}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1890015}},"memstats":{"gc_next":8506160,"memory
_alloc":4836896,"memory_total":20021805448}},"filebeat":{"events":{"active":1,"added":202,"done":201},"harvester":{"closed":1,"open_files":6,"running":6,"started":1}},"libbeat":{"config":{"module":{"runni
ng":1}},"output":{"events":{"acked":198,"batches":11,"total":198}},"outputs":{"kafka":{"bytes_read":2184,"bytes_write":41681}},"pipeline":{"clients":2,"events":{"active":1,"filtered":3,"published":199,"to
tal":202},"queue":{"acked":198}}},"registrar":{"states":{"cleanup":1,"current":31,"update":201},"writes":14},"system":{"load":{"1":0.27,"15":0.52,"5":0.49,"norm":{"1":0.135,"15":0.26,"5":0.245}}}}}}
2018-03-05T08:15:18.862Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6040,"time":6040},"total":{"tic
ks":116010,"time":116016,"value":116010},"user":{"ticks":109970,"time":109976}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1920015}},"memstats":{"gc_next":7347296,"memory
_alloc":3985256,"memory_total":20070943128}},"filebeat":{"events":{"active":7,"added":103,"done":96},"harvester":{"open_files":6,"running":6}},"libbeat":{"config":{"module":{"running":1}},"output":{"event
s":{"acked":96,"batches":18,"total":96}},"outputs":{"kafka":{"bytes_read":2912,"bytes_write":43933}},"pipeline":{"clients":2,"events":{"active":8,"published":103,"total":103},"queue":{"acked":96}}},"regis
trar":{"states":{"current":31,"update":96},"writes":18},"system":{"load":{"1":0.27,"15":0.51,"5":0.48,"norm":{"1":0.135,"15":0.255,"5":0.24}}}}}}
2018-03-05T08:15:48.866Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6070,"time":6073},"total":{"tic
ks":116170,"time":116182,"value":116170},"user":{"ticks":110100,"time":110109}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1950019}},"memstats":{"gc_next":10028784,"memor
y_alloc":8121544,"memory_total":20107042680,"rss":-13885440}},"filebeat":{"events":{"active":-5,"added":162,"done":167},"harvester":{"open_files":6,"running":6}},"libbeat":{"config":{"module":{"running":1
}},"output":{"events":{"acked":167,"batches":12,"total":167}},"outputs":{"kafka":{"bytes_read":2072,"bytes_write":37035}},"pipeline":{"clients":2,"events":{"active":3,"published":162,"total":162},"queue":
{"acked":167}}},"registrar":{"states":{"current":31,"update":167},"writes":12},"system":{"load":{"1":0.16,"15":0.49,"5":0.43,"norm":{"1":0.08,"15":0.245,"5":0.215}}}}}}
2018-03-05T08:16:18.863Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6100,"time":6108},"total":{"tic
ks":116370,"time":116382,"value":116370},"user":{"ticks":110270,"time":110274}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1980015}},"memstats":{"gc_next":11899120,"memor
y_alloc":6458176,"memory_total":20145928336}},"filebeat":{"events":{"added":82,"done":82},"harvester":{"open_files":6,"running":6}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked"
:82,"batches":14,"total":82}},"outputs":{"kafka":{"bytes_read":2296,"bytes_write":35903}},"pipeline":{"clients":2,"events":{"active":3,"published":82,"total":82},"queue":{"acked":82}}},"registrar":{"state
s":{"current":31,"update":82},"writes":14},"system":{"load":{"1":0.1,"15":0.47,"5":0.39,"norm":{"1":0.05,"15":0.235,"5":0.195}}}}}}
2018-03-05T08:16:36.021Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error EOF
2018-03-05T08:16:36.021Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.030Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.030Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.031Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.031Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.033Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.033Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.035Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

yawboateng · 2018-03-05T19:14:44Z

Seeing the same issue with es stack 6.2 on k8s 1.8.5

multiple lines of:
kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

and causing the pods to run OOM and crash

idahoakl · 2018-03-07T02:08:52Z

Seeing the same issue with filebeat 6.2.2 on k8s 1.7.2.

It also appears that when the issue occurs the k8s api server is flooded with WATCH requests which causes the CPU on the k8s api servers to increase to almost 100% when the load is normally < 10%.

Would it be better to create a new issue rather than comment on a merged pull request?

walktall · 2018-03-07T02:20:25Z

Maybe I can build a image with latest beat master but using kubernetes/client-go instead of the library currently in used? I am pretty sure this is compatibility issue in k8s library.

exekias · 2018-03-07T02:29:43Z

I've created #6503, thanks @walktall, I agree client-go would fix this particular issue but I'll prefer to give k8s another try with a different fix, working on it

vjsamuel · 2018-03-07T06:10:22Z

@wflyer @idahoakl @simioa is there a way to predictably reproduce this? I have been waiting for such an issue to come up on minikube but I'm not seeing it.

willemdh · 2018-03-07T07:42:59Z

@vjsamuel I've not been able to reproduce it. Generally after restarting Filebeat everything works as expected. It takes 3-5 days before this issue appears.

vjsamuel · 2018-03-07T07:56:47Z

never mind @willemdh. I was able to reproduce. I have a PR with the fix.

exekias · 2018-03-12T10:13:59Z

Just an update on this, a fix for #6503 has been merged and should go out in both 6.3.0 and 6.2.3

In case of unmarshalling errors the Watcher was ending up in a infinite loop, this code ignores those errors and only restarts the watcher in case of EOF.

exekias added bug libbeat labels Feb 12, 2018

Avoid infinite loop in Kubernetes watcher

32debe1

In case of unmarshalling errors the Watcher was ending up in a infinite loop, this code ignores those errors and only restarts the watcher in case of EOF.

exekias force-pushed the 6.2-fix-k8s-watcher-loop branch from 3b44e64 to 32debe1 Compare February 12, 2018 12:25

vjsamuel suggested changes Feb 12, 2018

View reviewed changes

exekias added the in progress Pull request is currently in progress. label Feb 13, 2018

Add exponential backoff on retries

d6a2792

exekias added review and removed in progress Pull request is currently in progress. labels Feb 13, 2018

vjsamuel approved these changes Feb 13, 2018

View reviewed changes

ruflin merged commit b22a55e into elastic:6.2 Feb 13, 2018

exekias mentioned this pull request Mar 7, 2018

Infinite loop while watching for Kubernetes events #6503

Closed

sstarcher mentioned this pull request Mar 22, 2018

Filebeat k8s fix helm/charts#4408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid infinite loop in Kubernetes watcher #6353

Avoid infinite loop in Kubernetes watcher #6353

exekias commented Feb 12, 2018

vjsamuel Feb 12, 2018

exekias commented Feb 13, 2018

jimsheldon commented Feb 21, 2018

exekias commented Feb 21, 2018

jimsheldon commented Feb 21, 2018 •

edited

Loading

exekias commented Feb 21, 2018

jimsheldon commented Feb 21, 2018

willemdh commented Feb 26, 2018 •

edited

Loading

ruflin commented Feb 27, 2018

walktall commented Feb 27, 2018 via email

walktall commented Feb 27, 2018

willemdh commented Feb 27, 2018

ruflin commented Feb 28, 2018

ghost commented Feb 28, 2018 •

edited by ghost

Loading

exekias commented Feb 28, 2018

walktall commented Mar 1, 2018

exekias commented Mar 1, 2018

simioa commented Mar 1, 2018

wflyer commented Mar 2, 2018

vjsamuel commented Mar 2, 2018 •

edited

Loading

wflyer commented Mar 2, 2018

wflyer commented Mar 4, 2018

exekias commented Mar 4, 2018 •

edited

Loading

simioa commented Mar 5, 2018

yawboateng commented Mar 5, 2018

idahoakl commented Mar 7, 2018

walktall commented Mar 7, 2018

exekias commented Mar 7, 2018

vjsamuel commented Mar 7, 2018 •

edited

Loading

willemdh commented Mar 7, 2018 •

edited

Loading

vjsamuel commented Mar 7, 2018

exekias commented Mar 12, 2018 •

edited

Loading

Avoid infinite loop in Kubernetes watcher #6353

Avoid infinite loop in Kubernetes watcher #6353

Conversation

exekias commented Feb 12, 2018

vjsamuel Feb 12, 2018

Choose a reason for hiding this comment

exekias commented Feb 13, 2018

jimsheldon commented Feb 21, 2018

exekias commented Feb 21, 2018

jimsheldon commented Feb 21, 2018 • edited Loading

exekias commented Feb 21, 2018

jimsheldon commented Feb 21, 2018

willemdh commented Feb 26, 2018 • edited Loading

ruflin commented Feb 27, 2018

walktall commented Feb 27, 2018 via email

walktall commented Feb 27, 2018

willemdh commented Feb 27, 2018

ruflin commented Feb 28, 2018

ghost commented Feb 28, 2018 • edited by ghost Loading

exekias commented Feb 28, 2018

walktall commented Mar 1, 2018

exekias commented Mar 1, 2018

simioa commented Mar 1, 2018

wflyer commented Mar 2, 2018

vjsamuel commented Mar 2, 2018 • edited Loading

wflyer commented Mar 2, 2018

wflyer commented Mar 4, 2018

exekias commented Mar 4, 2018 • edited Loading

simioa commented Mar 5, 2018

yawboateng commented Mar 5, 2018

idahoakl commented Mar 7, 2018

walktall commented Mar 7, 2018

exekias commented Mar 7, 2018

vjsamuel commented Mar 7, 2018 • edited Loading

willemdh commented Mar 7, 2018 • edited Loading

vjsamuel commented Mar 7, 2018

exekias commented Mar 12, 2018 • edited Loading

jimsheldon commented Feb 21, 2018 •

edited

Loading

willemdh commented Feb 26, 2018 •

edited

Loading

ghost commented Feb 28, 2018 •

edited by ghost

Loading

vjsamuel commented Mar 2, 2018 •

edited

Loading

exekias commented Mar 4, 2018 •

edited

Loading

vjsamuel commented Mar 7, 2018 •

edited

Loading

willemdh commented Mar 7, 2018 •

edited

Loading

exekias commented Mar 12, 2018 •

edited

Loading