Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Middleware Error Logging: Vision on StatusBadGateway/ServiceUnavailable #192

Open
joe-elliott opened this issue Jul 13, 2020 · 2 comments

Comments

@joe-elliott
Copy link
Contributor

joe-elliott commented Jul 13, 2020

The http logging middleware splits out different request results and logs them as either debug or warn. Generally errors are logged as warn and successes are logged as debug.

if 100 <= statusCode && statusCode < 500 || statusCode == http.StatusBadGateway || statusCode == http.StatusServiceUnavailable {
l.logWithRequest(r).Debugf("%s %s (%d) %s", r.Method, uri, statusCode, time.Since(begin))
if l.LogRequestHeaders && headers != nil {
l.logWithRequest(r).Debugf("ws: %v; %s", IsWSHandshakeRequest(r), string(headers))
}
} else {
l.logWithRequest(r).Warnf("%s %s (%d) %s Response: %q ws: %v; %s",
r.Method, uri, statusCode, time.Since(begin), buf.Bytes(), IsWSHandshakeRequest(r), headers)
}

We need to log the below error conditions that are currently being logged as debug. Unfortunately, due to volume, we can't turn on debug logging.

statusCode == http.StatusBadGateway || statusCode == http.StatusServiceUnavailable

My guess is that these two statuses are logged at a debug level due to volume if the backend is unavailable. We would like to log these failures at a higher level than debug, but also recognize that the volume would be too great to log if a backend is down.

The change we'd like to make:

  • Move http.StatusBadGateway and http.StatusServiceUnavailable to be logged at a Warn level with the other errors
  • Use a configurable rate limited logger to log errors instead of logging 100% of all errors at Warn

Thoughts?

If this (or something similar) is acceptable I'd be glad to PR it.

@bboreham

@bboreham
Copy link
Collaborator

Background is here: cortexproject/cortex#810, http://github.com/weaveworks/common/pull/84

I would be ok with sampling the messages (we have -event.sample-rate in cortex already); I guess a rate limit is also fine

You could also sample after the line hits the logfile?

@joe-elliott
Copy link
Contributor Author

Thanks for the background. As suspected those two error codes were just overwhelming the logs and so they got removed. It sounds like you're ok with the general idea so I will submit a PR and we can discuss details there.

You could also sample after the line hits the logfile?
Unsure what you mean by this. Like log everything but only push a certain subset of the logfile to the backend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants