Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress errors while RQ is retrying #1074

Closed
BobReid opened this issue Apr 8, 2021 · 7 comments · Fixed by #1076
Closed

Suppress errors while RQ is retrying #1074

BobReid opened this issue Apr 8, 2021 · 7 comments · Fixed by #1076

Comments

@BobReid
Copy link
Contributor

BobReid commented Apr 8, 2021

I am using RQ as my job queue. I have some use cases where retries and optimistic locking are used to smooth out contentious writes to the database and out of order execution.

This results in a lot of false positives in Sentry. These issues fix themselves after retry so I do not want them reported to sentry until all the retries are exhausted. Currently there isn't a way to suppress the exception due to the way the sentry integration is monkey patching RQ. You need to inspect the RQ job instance status in order to determine whether or not retries have been exhausted.

I am happy to contribute the change if it has a chance at being accepted.

I see three possible solutions:

  1. Add a settings to the RQ integration along the lines of failed_jobs_only. This setting would default to false to maintain backwards compatibility.
  2. Provide a hook mechanism that passes the job in and let users customize whether or not to capture the event
  3. Stuff some job information like the status into the hint so it can be inspected and filtered via the standard before_send hook.

IMO 1 or 3 is the way to go. 1 is the simplest and 3 is more flexible. I am not sure if stuffing random data into hint like this is desirable from Sentry's standpoint.

@untitaker
Copy link
Member

celery integration does not report those kinds of exceptions so for sake of consistency I also wouldn't do that for RQ (regardless of whether the behavior is controllable by an option)

@BobReid
Copy link
Contributor Author

BobReid commented Apr 8, 2021

@untitaker

I have thrown together a quick PR. It can be as simple as checking if the job has failed. Let me know your thoughts and I can write a test for it.

#1076

@BobReid
Copy link
Contributor Author

BobReid commented Apr 13, 2021

I have added a test to my PR. I had to add an ignore_logger call to the RQ integration or else the exception was still captured by the logging integration.

@BobReid
Copy link
Contributor Author

BobReid commented May 3, 2021

@untitaker any thoughts on the resolution I have proposed.

This is causing unnecessary disruptions to my team. I would prefer to fix this upstream in sentry but if now we will have to look at a way to patch it in our system.

@untitaker
Copy link
Member

@BobReid CI is not green so I have restarted it

I think the fix is fine (and I really apprechiate it and will make sure to get it merged this week, sorry for the delay!) but in general I would always find a way to unblock yourself from getting OSS contributions merged. If you need to escalate an issue I would go through Sentry support to get things looked at, because to support tickets we actually have SLOs and resources attached.

@BobReid
Copy link
Contributor Author

BobReid commented May 3, 2021

@untitaker I understand things won't always get fixed in the timely manner. In this case, it meant the team would not be able to rely on retries for smoothing out of order execution, which is a tool I want to be able to employ when necessary.

There didn't seem to be anything we could hook into in order to suppress the event. The problem is that the typical before send hook did not have any job context to inspect in order to determine whether or not the event should be suppressed.

The only thing to explore was abandoning retries completely, or monkey patching sentry to apply my fix from our code base. Both solutions were less than ideal.

I appreciate you taking another look at this. Thanks again.

@vaal-
Copy link

vaal- commented Dec 21, 2021

Hello

I have the opposite situation in which it would be more convenient if all retries were logged. So I wanted to ask - if I make a PR that will add an option that will enable logging - will such a change be accepted? or a principled position that such should not be logged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants