-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exponential back-off to retryStrategy #1782
Conversation
// See if we have waited past the deadline | ||
if time.Now().Before(waitingDeadline) { | ||
retryMessage := fmt.Sprintf("Retrying in %s", humanize.Duration(time.Until(waitingDeadline))) | ||
return woc.markNodePhase(node.Name, node.Phase, retryMessage), false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of an interesting trade-off here is that since we are changing the parentNode
message, the workflow will be immediately re-queued. This makes re-queuing the Workflow with a duration (using woc.requeue(time.Until(waitingDeadline))
) moot, and introduces some "polling" behavior in which the workflow is operated on once a second—mostly to update the "Retrying in ..." message.
If we decide that this polling behavior is not acceptable, then could make the "Retrying in ..." message static (e.g. "Retrying at [TIME]") and we could then re-queue with a duration as explained above.
limit: 10 | ||
backoff: | ||
duration: 1 # Default unit is seconds. Could also be a Duration, e.g.: "2m", "6h", "1d" | ||
factor: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the bikeshedding, but should this not be called "base" instead, given that it is, in fact, the base of the exponential function?
Also, good work :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duration is actually what K8s uses :) I think because technically factor
is optional; if left at 0
then the backoff would simply apply the same duration
wait
Can you resolve the conflicts? |
Closes: #700. Supports
duration
,factor
,maxDuration
.Example:
When waiting to retry (see parent node message):
When max duration limit exceeded: