Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return errors from Docker client.Events #2689

Merged
merged 2 commits into from
Jan 15, 2018

Conversation

BlakeMesdag
Copy link
Contributor

@BlakeMesdag BlakeMesdag commented Jan 10, 2018

What does this PR do?

Logs Errors from the Docker client.Events channel and prevents hangs when the event stream will no longer be pushed into.

Motivation

Backends getting out of sync during intermittent connection issues to the Docker daemon, and not being able to force them to refresh.

More

  • Added/updated tests

Additional Notes

All errors in the Docker client.Events method cause events to stop being processed, leaving the messages channel open (https://github.com/moby/moby/blob/master/client/events.go#L20-L71). Since events are used as a notification mechanism it's save to reconnect if we lose one or more events.

This area of the codebase is untested. You can test/verify behaviours with a PoC I wrote here:

package main

import (
	"context"
	"fmt"

	"github.com/docker/docker/api/types"
	"github.com/docker/docker/api/types/filters"
	"github.com/docker/docker/client"
)

func main() {
	var err error

	cli, err := client.NewEnvClient()
	if err != nil {
		panic(err)
	}

	f := filters.NewArgs()
	f.Add("type", "container")

	options := types.EventsOptions{
		Filters: f,
	}

	ctx := context.Background()
	// ctx, cancel := context.WithCancel(ctx)

	eventsc, errc := cli.Events(ctx, options)
	for {
		select {
		case event := <-eventsc:
			fmt.Println("Received event: ", event)
		case err := <-errc:
			if err == nil {
				continue
			}

			fmt.Println("Received error: ", err)
			return
		}
	}
}

@ldez
Copy link
Contributor

ldez commented Jan 10, 2018

@BlakeMesdag Thanks for the contribution 👍

Could you add the missing io import.

@BlakeMesdag
Copy link
Contributor Author

Done, forgot to push that up after I saw the failure.

@ldez ldez added the kind/bug/fix a bug fix label Jan 10, 2018
if event.Action == "start" ||
event.Action == "die" ||
strings.HasPrefix(event.Action, "health_status") {
startStopHandle(event)
Copy link
Contributor

@timoreimann timoreimann Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I'm not familiar with the Docker provider.)

Is there a chance that the event channel gets closed (as opposed to receiving a terminating event)? If it does, then the former implementation naturally exited the loop while the new select approach doesn't do so anymore.

Just trying to make sure that we keep terminating when / if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no chance of the event channel being closed. The channel that is returned by the Events method on the Docker client returns receive-only channels and only ever closes the errors channel.

The errors channel is always pushed into before each of the return paths in the Events method, and always a single error. The only other way that method could be escaped is with a panic which would propagate up to the Traefik provider operation where it would be recovered and retried until success, or until 15 minutes had elapsed (the default exponential backoff limit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked it in the PR, but heres the Events method for reference: https://github.com/moby/moby/blob/master/client/events.go#L20-L71

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating. 👍

@timoreimann
Copy link
Contributor

Could this PR be the first to contribute some tests to the Docker provider?
If it turns out to be rather complicated, I'm also okay to postpone that. Presumably, with no tests available the Docker provider isn't very testable, but I haven't looked at it closely.

@timoreimann
Copy link
Contributor

@ldez rebase to 1.5? seems like an important fix.

@ldez ldez added this to the 1.5 milestone Jan 12, 2018
@BlakeMesdag
Copy link
Contributor Author

Not that I wouldn't like to add tests, but there's a lot of overhead before I'd get to testing this new functionality.

There are a few there already for other portions of the provider, but there's a lot of tests to be added for this specific file before I'd be getting to this functionality.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can always add tests in subsequent PRs.

Thanks a lot for your contribution!

@ldez ldez force-pushed the fix-docker-events-reconnect branch from c79c893 to caf46a3 Compare January 15, 2018 10:48
@BlakeMesdag BlakeMesdag requested a review from a team as a code owner January 15, 2018 10:48
@ldez ldez changed the base branch from master to v1.5 January 15, 2018 10:48
@ldez ldez removed the bot/no-merge label Jan 15, 2018
Copy link
Member

@mmatur mmatur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@traefiker traefiker merged commit 56c0634 into traefik:v1.5 Jan 15, 2018
@BlakeMesdag BlakeMesdag deleted the fix-docker-events-reconnect branch January 15, 2018 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants