Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid infinite loop in Kubernetes watcher #6353

Merged
merged 2 commits into from
Feb 13, 2018

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Feb 12, 2018

In case of unmarshalling errors the Watcher was ending up in a infinite loop,
this code ignores those errors and only restarts the watcher in case of
EOF.

This was reproduced by @cdahlqvist:

2018-02-09T16:37:46.072Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-09T16:37:46.072Z	INFO	kubernetes/watcher.go:133	kubernetes: Watching API for pod events
2018-02-09T16:37:46.130Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-09T16:37:46.131Z	INFO	kubernetes/watcher.go:133	kubernetes: Watching API for pod events
2018-02-09T16:37:46.131Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-09T16:37:46.131Z	INFO	 kubernetes/watcher.go:133	kubernetes: Watching API for pod events

Opening directly to 6.2 as master is not affected, after its refactoring and client update

In case of unmarshalling errors the Watcher was ending up in a infinite loop,
this code ignores those errors and only restarts the watcher in case of
EOF.
@exekias exekias force-pushed the 6.2-fix-k8s-watcher-loop branch from 3b44e64 to 32debe1 Compare February 12, 2018 12:25
// In case of EOF, stop watching and restart the process
if err == io.EOF || err == io.ErrUnexpectedEOF {
watcher.Close()
time.Sleep(time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please make this binary back off?

@exekias exekias added the in progress Pull request is currently in progress. label Feb 13, 2018
@exekias
Copy link
Contributor Author

exekias commented Feb 13, 2018

failing tests are unrelated to this PR

@exekias exekias added review and removed in progress Pull request is currently in progress. labels Feb 13, 2018
@ruflin ruflin merged commit b22a55e into elastic:6.2 Feb 13, 2018
@jimsheldon
Copy link

We updated to docker.elastic.co/beats/filebeat:6.2.2 hoping it would fix this issue but it is still occurring, is there any chance that container does not have this fix?

@exekias
Copy link
Contributor Author

exekias commented Feb 21, 2018

Uhm, could you paste the log output? Some things should have, at least, changed

@jimsheldon
Copy link

jimsheldon commented Feb 21, 2018

Sure:

2018-02-21T00:22:57.818Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.819Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.819Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.819Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.820Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.820Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.820Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.821Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.821Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.821Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.822Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.822Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.822Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.823Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.823Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.823Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.824Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.824Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.824Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.825Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.825Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.825Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.826Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-21T00:22:57.827Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-21T00:22:57.827Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-21T00:22:57.827Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events

Thanks for looking into it!

Edit: I see this addition in this pull request:

const maxBackoff = 30 * time.Second

However, there doesn't appear to be any wait between failures.

@exekias
Copy link
Contributor Author

exekias commented Feb 21, 2018

This is weird, it looks like the change is in, but watcher connection backoff doesn't seem in place. Do you know by a chance what's the event originating this? I haven't reproduced it myself

@jimsheldon
Copy link

So far I haven't been able to find a specific suspicious event right before the loop occurs, do you have any suggestions for what I could look for?

@willemdh
Copy link

willemdh commented Feb 26, 2018

Hello,

We are experiencing this very same issue on Filebeat 6.2.2 when the 'add_kubernetes_metadata' processor is added on our Openshift Kubernetes nodes.

Grtz

@ruflin
Copy link
Contributor

ruflin commented Feb 27, 2018

@jimsheldon @willemdh Any chance both of you could enable debug logging and share the log output here (or link to a gist)?

@walktall
Copy link
Contributor

walktall commented Feb 27, 2018 via email

@walktall
Copy link
Contributor

I once make a issue for this ericchiang/k8s#68 before while using beat, but didn't get response for this, while when I test #6117 with client-go, it didn't happen again.

@willemdh
Copy link

@ruflin Well it seems after restarting the FIlebeat agents on our 10 Openshift nodes, the problem did not reappear.. I'm monitoring closely to find a pattern in the hope to reproduce.

@ruflin
Copy link
Contributor

ruflin commented Feb 28, 2018

@willemdh Keep us posted if it happens again. I think a point that I only realised now is that you are running OpenShift. There were some subtle differences in the past, so this is probably also a good data point to add to the equation when we see this again.

@ghost
Copy link

ghost commented Feb 28, 2018

We are facing the same Issues with Filebeat 6.2.2 using the add_kubernetes_metadata processor.
Unfortunately redeploying the Filebeat agents in our Kubernetes Cluster did not work.

2018-02-28T19:15:37.823Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.826Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.826Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.828Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.828Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.848Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.851Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.851Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.856Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.857Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.858Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.867Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.867Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.867Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.869Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.870Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.870Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.876Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.876Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.876Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.876Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.878Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.878Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.887Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.888Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.888Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.888Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.888Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.888Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.897Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.897Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.897Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.909Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.909Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.909Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.913Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.914Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.914Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.918Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.920Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.920Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.924Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.924Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.924Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events
2018-02-28T19:15:37.932Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-02-28T19:15:37.933Z	ERROR	kubernetes/watcher.go:154	kubernetes: Watching API error EOF
2018-02-28T19:15:37.933Z	INFO	kubernetes/watcher.go:140	kubernetes: Watching API for pod events

@exekias
Copy link
Contributor Author

exekias commented Feb 28, 2018

@walktall I'm wondering, did you reproduce this again after updating the client library and refactoring (#6159)?

@walktall
Copy link
Contributor

walktall commented Mar 1, 2018

@exekias no. So maybe the updated client library make help here?

@exekias
Copy link
Contributor Author

exekias commented Mar 1, 2018

That was my expectation, would anyone suffering from this issue be up to give a try to master if I build an image?

@simioa
Copy link
Contributor

simioa commented Mar 1, 2018

@exekias I would give it a try

@wflyer
Copy link
Contributor

wflyer commented Mar 2, 2018

I'm experiencing the same issue in custom build image from master(5fa9be4).

2018-03-01T19:43:54.203Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.210Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.210Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.212Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.212Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.213Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.214Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.215Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.215Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.216Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-01T19:43:54.216Z	INFO	kubernetes/watcher.go:172	kubernetes: Watching API for resource events
2018-03-01T19:43:54.218Z	ERROR	kubernetes/watcher.go:187	kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

Also, I'm not sure if it caused crash, but the last logs of the previous instance of restarted pods end with this error.

@vjsamuel
Copy link
Contributor

vjsamuel commented Mar 2, 2018

I might not have entire context about the issue. I am guessing that this is because of an incompatible kube client as the issue seems to be coming from proto unmarshalling.

@wflyer what is the version of Kubernetes that you're running on?

@wflyer
Copy link
Contributor

wflyer commented Mar 2, 2018

@vjsamuel I'm using kubernetes version v1.9.2.

Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

@wflyer
Copy link
Contributor

wflyer commented Mar 4, 2018

For the crash part that I mentioned (#6353 (comment)) it looks like that this issue significantly increases the memory usage of the pod and makes the pod to be killed due to OOM.

@exekias
Copy link
Contributor Author

exekias commented Mar 4, 2018

Thank you for the patience everyone, I've built a 6.3 snapshot: exekias/filebeat:6.3-snapshot, @simioa (and anyone interested) please let me know if it fixes this issue for you

@simioa
Copy link
Contributor

simioa commented Mar 5, 2018

@exekias
Unfortunately the issue still persists.
I just attached the log extract of the first occurrence of the Issue.

2018-03-05T08:14:43.736Z        INFO    log/harvester.go:241    File is inactive: /var/lib/docker/containers/3c0e0eb018b0402efd611c270dba466d3860b35372fc649a5404f9f70358df56/3c0e0eb018b0402efd611c270dba46
6d3860b35372fc649a5404f9f70358df56-json.log. Closing because close_inactive of 5m0s reached.
2018-03-05T08:14:48.816Z        INFO    log/harvester.go:216    Harvester started for file: /var/lib/docker/containers/e8ca48b2305214f3777cd780a8dfef92de5f98154093b595733418113b3b681b/e8ca48b2305214f3777c
d780a8dfef92de5f98154093b595733418113b3b681b-json.log
2018-03-05T08:14:48.863Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6000,"time":6002},"total":{"tic
ks":115770,"time":115781,"value":115770},"user":{"ticks":109770,"time":109779}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1890015}},"memstats":{"gc_next":8506160,"memory
_alloc":4836896,"memory_total":20021805448}},"filebeat":{"events":{"active":1,"added":202,"done":201},"harvester":{"closed":1,"open_files":6,"running":6,"started":1}},"libbeat":{"config":{"module":{"runni
ng":1}},"output":{"events":{"acked":198,"batches":11,"total":198}},"outputs":{"kafka":{"bytes_read":2184,"bytes_write":41681}},"pipeline":{"clients":2,"events":{"active":1,"filtered":3,"published":199,"to
tal":202},"queue":{"acked":198}}},"registrar":{"states":{"cleanup":1,"current":31,"update":201},"writes":14},"system":{"load":{"1":0.27,"15":0.52,"5":0.49,"norm":{"1":0.135,"15":0.26,"5":0.245}}}}}}
2018-03-05T08:15:18.862Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6040,"time":6040},"total":{"tic
ks":116010,"time":116016,"value":116010},"user":{"ticks":109970,"time":109976}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1920015}},"memstats":{"gc_next":7347296,"memory
_alloc":3985256,"memory_total":20070943128}},"filebeat":{"events":{"active":7,"added":103,"done":96},"harvester":{"open_files":6,"running":6}},"libbeat":{"config":{"module":{"running":1}},"output":{"event
s":{"acked":96,"batches":18,"total":96}},"outputs":{"kafka":{"bytes_read":2912,"bytes_write":43933}},"pipeline":{"clients":2,"events":{"active":8,"published":103,"total":103},"queue":{"acked":96}}},"regis
trar":{"states":{"current":31,"update":96},"writes":18},"system":{"load":{"1":0.27,"15":0.51,"5":0.48,"norm":{"1":0.135,"15":0.255,"5":0.24}}}}}}
2018-03-05T08:15:48.866Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6070,"time":6073},"total":{"tic
ks":116170,"time":116182,"value":116170},"user":{"ticks":110100,"time":110109}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1950019}},"memstats":{"gc_next":10028784,"memor
y_alloc":8121544,"memory_total":20107042680,"rss":-13885440}},"filebeat":{"events":{"active":-5,"added":162,"done":167},"harvester":{"open_files":6,"running":6}},"libbeat":{"config":{"module":{"running":1
}},"output":{"events":{"acked":167,"batches":12,"total":167}},"outputs":{"kafka":{"bytes_read":2072,"bytes_write":37035}},"pipeline":{"clients":2,"events":{"active":3,"published":162,"total":162},"queue":
{"acked":167}}},"registrar":{"states":{"current":31,"update":167},"writes":12},"system":{"load":{"1":0.16,"15":0.49,"5":0.43,"norm":{"1":0.08,"15":0.245,"5":0.215}}}}}}
2018-03-05T08:16:18.863Z        INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6100,"time":6108},"total":{"tic
ks":116370,"time":116382,"value":116370},"user":{"ticks":110270,"time":110274}},"info":{"ephemeral_id":"fe67b01d-c90f-4432-b3dc-32ac3695ccd4","uptime":{"ms":1980015}},"memstats":{"gc_next":11899120,"memor
y_alloc":6458176,"memory_total":20145928336}},"filebeat":{"events":{"added":82,"done":82},"harvester":{"open_files":6,"running":6}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked"
:82,"batches":14,"total":82}},"outputs":{"kafka":{"bytes_read":2296,"bytes_write":35903}},"pipeline":{"clients":2,"events":{"active":3,"published":82,"total":82},"queue":{"acked":82}}},"registrar":{"state
s":{"current":31,"update":82},"writes":14},"system":{"load":{"1":0.1,"15":0.47,"5":0.39,"norm":{"1":0.05,"15":0.235,"5":0.195}}}}}}
2018-03-05T08:16:36.021Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error EOF
2018-03-05T08:16:36.021Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.030Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.030Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.031Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.031Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.033Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.033Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.035Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

@yawboateng
Copy link

Seeing the same issue with es stack 6.2 on k8s 1.8.5

multiple lines of:
kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

and causing the pods to run OOM and crash

@idahoakl
Copy link

idahoakl commented Mar 7, 2018

Seeing the same issue with filebeat 6.2.2 on k8s 1.7.2.

It also appears that when the issue occurs the k8s api server is flooded with WATCH requests which causes the CPU on the k8s api servers to increase to almost 100% when the load is normally < 10%.

Would it be better to create a new issue rather than comment on a merged pull request?

@walktall
Copy link
Contributor

walktall commented Mar 7, 2018

Maybe I can build a image with latest beat master but using kubernetes/client-go instead of the library currently in used? I am pretty sure this is compatibility issue in k8s library.

@exekias
Copy link
Contributor Author

exekias commented Mar 7, 2018

I've created #6503, thanks @walktall, I agree client-go would fix this particular issue but I'll prefer to give k8s another try with a different fix, working on it

@vjsamuel
Copy link
Contributor

vjsamuel commented Mar 7, 2018

@wflyer @idahoakl @simioa is there a way to predictably reproduce this? I have been waiting for such an issue to come up on minikube but I'm not seeing it.

@willemdh
Copy link

willemdh commented Mar 7, 2018

@vjsamuel I've not been able to reproduce it. Generally after restarting Filebeat everything works as expected. It takes 3-5 days before this issue appears.

@vjsamuel
Copy link
Contributor

vjsamuel commented Mar 7, 2018

never mind @willemdh. I was able to reproduce. I have a PR with the fix.

@exekias
Copy link
Contributor Author

exekias commented Mar 12, 2018

Just an update on this, a fix for #6503 has been merged and should go out in both 6.3.0 and 6.2.3

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
In case of unmarshalling errors the Watcher was ending up in a infinite loop,
this code ignores those errors and only restarts the watcher in case of
EOF.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants