Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed weave interactions are logged as errors, even though it's normal operation #1599

Closed
jml opened this issue Jun 17, 2016 · 11 comments · Fixed by #1610
Closed

failed weave interactions are logged as errors, even though it's normal operation #1599

jml opened this issue Jun 17, 2016 · 11 comments · Fixed by #1610
Assignees
Milestone

Comments

@jml
Copy link
Contributor

jml commented Jun 17, 2016

e.g.

<probe> ERRO: 2016/06/17 12:56:35.574649 'weave ps' cmd failed: exit status 1

It's confusing to use "ERROR" for things that are part of normal operation. This should either be a warning, or just removed from the logs completely.

@rade rade changed the title Scope logs 'weave ps' command failure as error, even though it's normal operation Scope logs failed weave interactions as errors, even though it's normal operation Jun 17, 2016
@rade
Copy link
Member

rade commented Jun 17, 2016

ditto for

<app> ERRO: 2016/06/17 12:26:35.242523 Error updating weaveDNS, backing off 20s: exit status 1

This should either be a warning, or just removed from the logs completely.

Removing would be a bad idea because if you are running weave then these messages are noteworthy.

@jml
Copy link
Contributor Author

jml commented Jun 18, 2016

We talked a bit about this in person and agreed that:

  • scope shouldn't spam the logs if weave isn't there
  • UX is complicated by scope not having any knowledge of whether the user intends to use weave

@peterbourgon
Copy link
Contributor

Would be good to bump the priority on this one. Basically everyone trying to troubleshoot a Scope issue hits this and gets confused.

@tomwilkie
Copy link
Contributor

What changed to make this fail? It never used to fail. Would be good to understand what changed.

@tomwilkie tomwilkie added this to the 0.17.0 milestone Jun 20, 2016
@rade
Copy link
Member

rade commented Jun 20, 2016

It never used to fail.

Are you saying that scope never used to log an error when it cannot talk to weave / weaveDNS? That's not my recollection.

@rade
Copy link
Member

rade commented Jun 20, 2016

I cannot reproduce the weave ps error, with or without weave installed.

I do get the weaveDNS error, with or without weave installed.

I also see

<probe> ERRO: 2016/06/20 09:35:18.442144 Error collecting weave info, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: getsockopt: connection refused

with our without weave installed.

@tomwilkie
Copy link
Contributor

tomwilkie commented Jun 20, 2016

There are 4 integration points with weave:

  1. in the probe, we fetch the weave report and use it to build/tag the containers topology -https://github.com/weaveworks/scope/blob/master/probe/overlay/weave.go#L56
  2. in the probe, we run weave ps and tag containers with their weave ips (same place as 1.)
  3. in the probe, we talk to weave dns to resolve things - https://github.com/weaveworks/scope/blob/master/prog/probe.go#L195
  4. in the app, we talk to weave dns to insert a record for scope.weave.works - https://github.com/weaveworks/scope/blob/master/app/weave.go#L80

In 1, 2 & 4 we use a "backoff", which will log errors until it hits max backoff and then silence itself. In 3 we don't log errors. For 2, we are logging errors inside the weave ps 'client' - https://github.com/weaveworks/scope/blob/master/common/weave/client.go#L137. We should rejig the code so the error is propagated up to the backoff.

@tomwilkie
Copy link
Contributor

Are you saying that scope never used to log an error when it cannot talk to weave / weaveDNS? That's not my recollection.

No, I'm saying weave ps never used to fail irrespective of wether weave was running or not.

@rade
Copy link
Member

rade commented Jun 20, 2016

I'm saying weave ps never used to fail irrespective of wether weave was running or not.

It doesn't now either. Except for the mysterious weaveworks/weave#2388

I don't think adding a backoff to 3, and fixing that weave issue fully addresses the problem here. The errors in the logs are distracting even with a backoff.

@tomwilkie
Copy link
Contributor

Dupe of #1008?

@rade
Copy link
Member

rade commented Jun 20, 2016

it's a 1/3rd dupe :)

@rade rade changed the title Scope logs failed weave interactions as errors, even though it's normal operation failed weave interactions are logged as errors, even though it's normal operation Jun 23, 2016
@tomwilkie tomwilkie self-assigned this Jun 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants