-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return errors from Docker client.Events #2689
Return errors from Docker client.Events #2689
Conversation
@BlakeMesdag Thanks for the contribution 👍 Could you add the missing |
Done, forgot to push that up after I saw the failure. |
if event.Action == "start" || | ||
event.Action == "die" || | ||
strings.HasPrefix(event.Action, "health_status") { | ||
startStopHandle(event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I'm not familiar with the Docker provider.)
Is there a chance that the event channel gets closed (as opposed to receiving a terminating event)? If it does, then the former implementation naturally exited the loop while the new select approach doesn't do so anymore.
Just trying to make sure that we keep terminating when / if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no chance of the event channel being closed. The channel that is returned by the Events method on the Docker client returns receive-only channels and only ever closes the errors channel.
The errors channel is always pushed into before each of the return paths in the Events method, and always a single error. The only other way that method could be escaped is with a panic which would propagate up to the Traefik provider operation where it would be recovered and retried until success, or until 15 minutes had elapsed (the default exponential backoff limit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I linked it in the PR, but heres the Events method for reference: https://github.com/moby/moby/blob/master/client/events.go#L20-L71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating. 👍
Could this PR be the first to contribute some tests to the Docker provider? |
@ldez rebase to 1.5? seems like an important fix. |
Not that I wouldn't like to add tests, but there's a lot of overhead before I'd get to testing this new functionality. There are a few there already for other portions of the provider, but there's a lot of tests to be added for this specific file before I'd be getting to this functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can always add tests in subsequent PRs.
Thanks a lot for your contribution!
c79c893
to
caf46a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e event stream was closed
caf46a3
to
fc10c96
Compare
What does this PR do?
Logs Errors from the Docker
client.Events
channel and prevents hangs when the event stream will no longer be pushed into.Motivation
Backends getting out of sync during intermittent connection issues to the Docker daemon, and not being able to force them to refresh.
More
Additional Notes
All errors in the Docker
client.Events
method cause events to stop being processed, leaving the messages channel open (https://github.com/moby/moby/blob/master/client/events.go#L20-L71). Since events are used as a notification mechanism it's save to reconnect if we lose one or more events.This area of the codebase is untested. You can test/verify behaviours with a PoC I wrote here: