Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer leadership when establishLeadership fails #5247
Transfer leadership when establishLeadership fails #5247
Changes from all commits
06f41a8
fa43afb
6cf75c2
301e184
5767922
4071ae2
1534af3
ed12d67
8e47230
d534f10
09a1a32
c866127
c8e2774
48f28c8
2a97321
5ed8df5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK because we'll wait for 5 seconds in next wait loop and then timeout and
goto RECONCILE
. Right after that label we will hitinterval := time.After(s.config.ReconcileInterval)
again which presumably will set the interval back again.I must admit I'm a little confused about how
:=
works after a GOTO - it's reassigning the same variable name but in the same scope on subsequent jumps? I wonder if this is some strange variant of variable shadowing even though they are in the same scope? Maybe Go just has a special case to allow this when using GOTO but not in serial code? If it works I guess it's fine 😄There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean.
WAIT
is well after theRECONCILE
and theinterval
variable declaration and the code should just be using the same variable.I reproed your question in a playground: https://play.golang.org/p/6AqssHXg3Wt. If you jump before a variable declaration golang will create a new variable. Or did you mean something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I did a similar repro and it's fine, just was a strange one.
The path here is:
goto WAIT
which enters a select on a few things including that changoto RECONCILE
which immeidately re-assigns a timer chan for the original ReconcileInterval (using:=
).My original concern was that we might end up regaining leadership and then doing reconcile every 5 seconds after that but it's not the case due to the path mentioned above.
It also occurs to me that we always have had a re-assignment after
goto RECONCILE
so it's not really any different than before, it's just that was the only assignment before and I wondered if some strange form of shadowing might cause issues. That appears not to be the case so I think this is fine!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fun - Go does create a new variable with the same name.
You can see that here: https://play.golang.org/p/FU0ZxictDXE capturing the variable in a lambda and then looping with GOTO leaves the lamda holding the original value not the redefined one after the GOTO jump.
It's just weird to me because it's in the same scope - shadowing across scopes seems fine but this seems to be a special case you can't normally do outside of GOTO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, now I see what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now we attempt to transfer 3 times but if it fails we still hang out in non-leader limbo land for a bit before retrying?
I guess this is what i mentioned as "retry indefinitiely" and it should really work immediately if rest of the cluster is in an OK state so I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that since leadershipTransfer failed, we try to establishLeadership again. I think in general establishLeadership is more likely to succeed than transferLeadership. I think if I make the interval smaller - like 5 seconds - before it retries it is the better solution than trying to transfer leadership indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raft logging changed and we need to provide the hclogger now.