Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move 429 results from Error to Debug #46

Closed
wants to merge 1 commit into from
Closed

Conversation

bboreham
Copy link
Collaborator

429 indicates an over-limit condition which will have been logged by the component that detected it, so we don't need to log it again on the calling side.

Example:

{"log":"time=\"2017-07-15T21:26:35Z\" level=warning msg=\"gRPC /cortex.Ingester/Push (rpc error: code = Code(429) desc = per-metric series limit exceeded) 2.412325ms\" \n",[...]{"name":"ingester"}
{"log":"WARN: 2017/07/17 12:22:03.256129 POST /api/prom/push (429) 11.665082ms\n",[...],"container_name":"authfe"}

429 indicates an over-limit condition which will have been logged by the component that detected it.
@rade
Copy link
Member

rade commented Jul 18, 2017

I worry this will make problem investigation harder. Right now, when our monitoring shows a certain error rate for a component - say nHz, the logs for that component will have a corresponding number of error entries (n per second).

So in the example, the error currently shows up in the metrics of the ingester, distributor and authfe. And there will be corresponding error messages in the logs of all three. Whereas with the change, the metrics will be the same, but the error log messages will only show up in the ingester.

We usually investigate errors "front to back": we'd typically first look at the error rate in authfe, look at the logs to figure out what it is, and what, if any, component it originates from, then look at the error rate of that component, then look at its logs, etc, until we get to the bottom.

Only logging errors at the bottom component will break this method of investigation.

@@ -32,7 +32,7 @@ func (l Log) Wrap(next http.Handler) http.Handler {
}
i := &interceptor{ResponseWriter: w, statusCode: http.StatusOK}
next.ServeHTTP(i, r)
if 100 <= i.statusCode && i.statusCode < 400 {
if 100 <= i.statusCode && (i.statusCode < 400 || i.statusCode < 429) {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@jml
Copy link
Contributor

jml commented Jul 18, 2017

@jml
Copy link
Contributor

jml commented Jul 18, 2017

@rade A valid concern. However, note that you can still employ a front-to-back approach by looking at the metrics, chasing down which service the 429s come from.

@bboreham
Copy link
Collaborator Author

This PR was obviated by #59 which moved all 400-level errors to debug.

@bboreham bboreham closed this Dec 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants