Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Sentry shouldn't group unrelated errors together #2288

Open
jarifibrahim opened this issue Sep 19, 2018 · 5 comments
Open

Sentry shouldn't group unrelated errors together #2288

jarifibrahim opened this issue Sep 19, 2018 · 5 comments
Labels
enhancement 🏠 architecture idea An idea is a PR that shall not be deleted and captures a brain dump of some sort. kind/enhancement platform

Comments

@jarifibrahim
Copy link
Member

According to https://docs.sentry.io/learn/rollups/#grouping-by-stacktrace page, all errors going to sentry are grouped according to stacktrace (if present). The doc says

If a stacktrace or exception is involved in a report, then grouping will only consider this information.

Most of the errors thrown from the backend contain this minimal stacktrace

  File "/tmp/go/src/github.com/fabric8-services/fabric8-wit/sentry/sentry.go", line 101, in func1
  File "/tmp/go/src/github.com/fabric8-services/fabric8-wit/sentry/sentry.go", line 75, in loop

due to

c.c.CaptureError(err, additionalContext)

Since all events have similar stacktrace they are all grouped together into one issue and it makes it very difficult to search for errors. Look at https://errortracking.prod-preview.openshift.io/openshift_io/fabric8-wit/issues/2083/events/ .
It contains errors like

  • errors.InternalError: could not delete space
  • errors.InternalError: space UUID 51b37ab5-81cd-424d-90d7-3327860f0d64 is not valid space name
  • errors.InternalError: namespace with id 'user' not found

This can be fixed by either one of the following

  1. Improving the sentry client to use interface (interface) so that the events can be distinguished - This one seems to be the most feasible option and it would require minimal effort.

  2. Refactoring all errors so that instead of just returning the errors from the controller we return error along with the stacktrace using errs.WithStack(....) - I don't think we want this because it doesn't make sense to return the entire stacktrace to the user.

  3. Moving the sentry logging out of the jsonapi.JSONErrorResponse() into each controller so that every error can be logged/handled separately - This might solve the problem but this would mean we have to add sentryClient.CaptureError(...) to all error handling block. This doesn't seem feasible to me.

  4. Removing stacktrace from the error if it didn't have one originally - The sentry client somehow captures these two calls in the stacktrace

      File "/tmp/go/src/github.com/fabric8-services/fabric8-wit/sentry/sentry.go", line 101, in func1
      File "/tmp/go/src/github.com/fabric8-services/fabric8-wit/sentry/sentry.go", line 75, in loop
    

    if we could get rid of these then all the errors would be grouped according to the error message which would make much more sense.

I think option 1 would be the best way because it would allow us to send custom data (like controller name) to sentry.

What do you think @kwk @xcoulon @aslakknutsen ?

@jarifibrahim jarifibrahim added enhancement 🏠 architecture platform kind/enhancement idea An idea is a PR that shall not be deleted and captures a brain dump of some sort. labels Sep 19, 2018
@kwk
Copy link
Collaborator

kwk commented Sep 20, 2018

@jarifibrahim thank you for putting this list together. It is really good that you've looked into the issue. Can you elaborate on how option 1 would solve the problem please?

As a side note: I did hear from @aslakknutsen that some UI folks had the exact opposite problem that we have. They had too many sentry issues for the same problem in code. @aslakknutsen handed me this reference to what they did in order to address the problem: https://github.com/fabric8-ui/fabric8-ui/blob/fc37a478011d5190a42656ef0b01059e70336924/src/app/shared/exception.handler.ts#L47-L75

@jarifibrahim
Copy link
Member Author

@jarifibrahim thank you for putting this list together. It is really good that you've looked into the issue. Can you elaborate on how option 1 would solve the problem please?

@kwk I am still investigating how we could do this. https://github.com/getsentry/raven-go doesn't have much documentation about interfaces or how to use them correctly and it looks like it might take some time for the documentation to be updated (see getsentry/raven-go#205)

Meanwhile I looked into why those errors on sentry do not have a proper stacktrace and it turns out that the errors from the standard error package do not have a stacktrace by default. If we use errors.New from https://github.com/pkg/errors instead of errors.New from the standard error package(https://golang.org/pkg/errors/) we will have a stacktrace on errors.

@kwk
Copy link
Collaborator

kwk commented Sep 24, 2018

Meanwhile I looked into why those errors on sentry do not have a proper stacktrace and it turns out that the errors from the standard error package do not have a stacktrace by default. If we use errors.New from https://github.com/pkg/errors instead of errors.New from the standard error package(https://golang.org/pkg/errors/) we will have a stacktrace on errors.

@jarifibrahim that is well understood but the question is where we are not wrapping standard errors with errs.Wrap or where we're not using errs.Errorf/errs.New in our own code.

@jarifibrahim
Copy link
Member Author

@jarifibrahim that is well understood but the question is where we are not wrapping standard errors with errs.Wrap or where we're not using errs.Errorf/errs.New in our own code.

@kwk The problem starts here. This uses the standard golang error which has only string inside it. No stack information

// InternalError means that the operation failed for some internal, unexpected reason
type InternalError struct {
Err error
}

@kwk
Copy link
Collaborator

kwk commented Sep 24, 2018

@kwk The problem starts here. This uses the standard golang error which has only string inside it. No stack information

@jarifibrahim that is not correct. The standard golang error is a just an interface with no information connected. Therefore the Err in the above example can be an error with a stack trace. The issue though is that the InternalError will always terminate github.com/pkg/errors recursive call to find the cause of the error, aka the first error in the stack trace. We did this to use github.com/pkg/errors.Cause() to find out if the cause of an error was a internal error.

The implementation of github.com/pkg/errors.Cause() is very easy and maybe we can have a method written by use that returns true if in the stack there is an internal error.

If we implement the internal github.com/pkg/errors.causer interface on our own error types, then we have a stack trace for our own types as well: https://github.com/pkg/errors/blob/c059e472caf75dbe73903f6521a20abac245b17f/errors.go#L257

	type causer interface {
		Cause() error
	}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement 🏠 architecture idea An idea is a PR that shall not be deleted and captures a brain dump of some sort. kind/enhancement platform
Projects
None yet
Development

No branches or pull requests

2 participants