-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*.Retry.BackoffFunc #1160
*.Retry.BackoffFunc #1160
Conversation
I had usage of https://github.com/eapache/go-resiliency/tree/master/retrier listed on https://github.com/Shopify/sarama/wiki/Ideas-that-will-break-backwards-compatibility but I suppose we could make something optional like this. |
Yeah, the actual retry logic already exists so seems less risky to just give users a little flexibility wrt how the retry itself is calculated. Maybe retrier would make more sense when you're keen to make more sweeping changes to the innards of AsyncProducer et al? FWIW the specific case we're running into is on some very low volume topics, we run afoul of |
15f1568
to
7d903c1
Compare
Cleaned up the change a bit & added a test. I'll hook up similar changes for |
Addeed Holler out if anything else looks suspicious. |
Just giving this a bump -- any thoughts? |
This is something that I was researching today, let's bring it back and merge if things look okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, and I see how this is useful! As long as we keep it optional as mentioned by @eapache, this is good to merge!
This is just a proof-of-concept to convey the idea, I intend to flesh things out based on feedback if this seems like something that would be likely to land.
The
Retry.Backoff
config settings as they currently stand can force users to make some hard choices. TakeConfig.Producer.Retry.Backoff
for example: using the default configuration, a 300ms+ hiccup in the cluster could lead to data loss without careful error handling in user code. Say you want to survive up to a 10s hiccup without data loss -- you're left with a few bad options:Producer.Retry.Backoff
to 4-5s+ and live with the occasional large latency spike. (This includes things like partition leadership changes.); orProducer.Retry.Max
to 100 and keepProducer.Retry.Backoff
in the low 100ms and live with the risk of increased duplicates.This change offers a better middle ground: compute the backoff using an (optional) function. This opens the door for more sophisticated strategies (e.g. exponential backoff) without breaking existing code.
Seems like if we do this for producers, we might as well do it for Metadata, Consumer, etc. too.
I see exponential backoff was discussed over in #782 but I don't think the suggested patch ever materialized. So thought I'd have a shot.