-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider the introduction of a default maximum_retry_count #251
Comments
@hlascelles do you think that might be kind of surprising for developers? To me the new default makes sense because if a job has failed 15 times in a row over two days, it seems reasonable to assume thats because of some systemic failure, and that it's unlikely to succeed without something external changing. It think it'd be prudent for que to protect itself (and the system) by not retrying indefinitely, as if there is ever a systematically failing job and enough of them enqueued, they will pile up and consume all worker resources. I think the default of this "self-healing" (and also I guess less correct-ness oriented) approach makes sense so that developers don't accidentally shoot themselves in the foot with the above problem. Jobs like the Both sidekiq and delayed_job both have a fixed number of retries with exponential backoff as well. |
I agree that "retrying forever" is kind of a dangerous default. I can't imagine having a job run for more than 2 days and suddenly succeed again. In general I'm not a fan of (blindly) retrying jobs. In the applications I built, I let certain (really few) jobs automatically retry on certain errors. E.g. if there was a network error when trying to connect to a 3rd party API. But in general I want my job to fail if there's an exception. 😄 @chanks What are your thoughts on this? |
Yeah, I'm not a big fan of retrying forever, particularly as a default. I
do think it makes sense to support that behavior as a configuration option,
though I don't remember offhand if 1.0 already supports that.
…On Mon, Sep 23, 2019 at 12:32 PM Yves Siegrist ***@***.***> wrote:
I agree that "retrying forever" is kind of a dangerous default. I can't
imagine having a job run for more than 2 days and suddenly succeed again.
In general I'm not a fan of (blindly) retrying jobs.
In the applications I built, I let certain (really few) jobs automatically
retry on certain errors. E.g. if there was a network error when trying to
connect to a 3rd party API. But in general I want my job to fail if there's
an exception. 😄
That's just my opinion, but the current default feels pretty good to me.
@chanks <https://github.com/chanks> What are your thoughts on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#251?email_source=notifications&email_token=AACJIGSIGYL3HR35AQDDSFDQLDVTRA5CNFSM4IA5V3VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7LOZ5I#issuecomment-534179061>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACJIGSXCJUE64O2TJEZCT3QLDVTRANCNFSM4IA5V3VA>
.
|
Yes, good points all. I was surprised to learn the default for the gem delayed_job is complete job deletion - not something we could tolerate. EDIT: so does Sidekiq, albeit after some months. I'm glad que doesn't do that. OK, I'll close this knowing that the 1.x branch by default will keep failed (expired) jobs in a dead letter queue which will aid morning manual retries 👍. |
Que 1.x introduces maximum_retry_count, after which the job stops.
From
https://github.com/chanks/que/blob/master/docs/error_handling.md
There is a maximum_retry_count option for jobs. It defaults to 15 retries, which with the default retry interval means that a job will stop retrying after a little more than two days.
This is a great addition, but the fact that it is has a default that stops the job is concerning to me. Major versions can of course include breaking changes, but we only noticed it by chance.
One of the best things about Que is its resilience and the fact that jobs aren't lost (which we experienced constantly with Resque). I expect the change is related to the presence of the history table, but I'd say that is a bonus, not the main job flow.
Can I request the default be changed to "retry forever" in Que 1.x?
The text was updated successfully, but these errors were encountered: