Skip to content

Commit

Permalink
Cherry-pick #6504 to 6.2: Fix infinite failure on Kubernetes watch (#…
Browse files Browse the repository at this point in the history
…6530)

Cherry-pick of PR #6504 to 6.2 branch. Original message: 

This PR fixes #6503

How to reproduce: Run filebeat pointing to minikube. 

```
minikube ssh
sudo su

ps aux | grep localkube
kill -9 process_id
```

This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure:
```
type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" >  typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:""  <nil>
```

In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by `client-go`. The reason why the code fails with error is because we pass a `Pod` resource to do the `watcher.Next()` in this scenario the resource that is attempted to be parsed is an `Error` resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly. 

This fix is not the best in the world as it might miss few state changes.
  • Loading branch information
exekias authored and ruflin committed Mar 12, 2018
1 parent e950869 commit 38dc5c1
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 9 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ https://github.com/elastic/beats/compare/v6.2.2...6.2[Check the HEAD diff]
*Affecting all Beats*

- Avoid panic errors when processing nil Pod events in add_kubernetes_metadata. {issue}6372[6372]
- Fix infinite failure on Kubernetes watch {pull}6504[6504]

*Auditbeat*

Expand Down
15 changes: 6 additions & 9 deletions libbeat/common/kubernetes/watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,17 +152,14 @@ func (p *podWatcher) watch() {
_, apiPod, err := watcher.Next()
if err != nil {
logp.Err("kubernetes: Watching API error %v", err)
watcher.Close()

// In case of EOF, stop watching and restart the process
if err == io.EOF || err == io.ErrUnexpectedEOF {
watcher.Close()
backoff(failures)
failures++
break
if !(err == io.EOF || err == io.ErrUnexpectedEOF) {
// This is an error event which can be recovered by moving to the latest resource verison
logp.Info("kubernetes: Ignoring event, moving to most recent resource version")
p.lastResourceVersion = ""
}

// Otherwise, this is probably an unknown event (unmarshal error), ignore it
continue
break
}

// Update last resource version and reset failure counter
Expand Down

0 comments on commit 38dc5c1

Please sign in to comment.