Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify RenewPeriodic to retry failed Renew until TTL elapses #912

Merged
merged 1 commit into from
May 8, 2015

Conversation

rojojo23
Copy link

@rojojo23 rojojo23 commented May 7, 2015

So we've come across an interesting condition operating our consul cluster. Basically what we are seeing is that during a leader election all of our locks were being TTL'd out. Digging in a bit further it appears that during a leader election, the renew call fails and RenewPeriodic returns. After TTL time elapses the sessions are invalidated and the locks are lost.

I've made a modification to RenewPeriodic to keep trying if there is an error, up until the TTL passes. This should allow our sessions locks to survive a leader election, so long as it takes less than TTL/2 for the election to finish.

Here is an example set of logs that show what is happening:

 2015/05/02 19:22:52 [DEBUG] http: Request /v1/session/renew/db32a8ad-5e87-41b6-a217-f8f0e669948e (2.33141ms)
 2015/05/02 19:22:57 [DEBUG] http: Request /v1/session/renew/db32a8ad-5e87-41b6-a217-f8f0e669948e (1.907376ms)
  -> successful renews

 2015/05/02 19:23:02 [WARN] raft: Failed to contact quorum of nodes, stepping down
 2015/05/02 19:23:02 [WARN] raft: Failed to contact 10.XX.YY.ZZ:8300 in 562.087853ms
 2015/05/02 19:23:02 [INFO] consul: cluster leadership lost
   -> some communication failure occurred, calls for election

 2015/05/02 19:23:02 [ERR] http: Request /v1/session/renew/db32a8ad-5e87-41b6-a217-f8f0e669948e, error: rpc error: rpc error: No cluster leader
 2015/05/02 19:23:02 [DEBUG] http: Request /v1/session/renew/db32a8ad-5e87-41b6-a217-f8f0e669948e (57.295288ms)
    -> RenewPeriodic exits since there is no cluster leader

 2015/05/02 19:23:05 [INFO] consul: New leader elected: XYZ
    -> in 3s we have a new leader

 2015/05/02 19:23:25 [DEBUG] consul.state: Session db32a8ad-5e87-41b6-a217-f8f0e669948e TTL expired
 2015/05/02 19:23:25 [DEBUG] consul.state: Invalidating session db32a8ad-5e87-41b6-a217-f8f0e669948e due to session 
    -> 23s later the TTL expires and session is invalidated

It's another question as to why an apparently healthy cluster lost communication and forced an election but I have not figured out the root cause of that yet.

Thanks!

@armon
Copy link
Member

armon commented May 8, 2015

@rojojo23 Great catch! Thanks!

armon added a commit that referenced this pull request May 8, 2015
modify RenewPeriodic to retry failed Renew until TTL elapses
@armon armon merged commit db134f6 into hashicorp:master May 8, 2015
@rojojo23
Copy link
Author

rojojo23 commented May 8, 2015

Thank you!

duckhan pushed a commit to duckhan/consul that referenced this pull request Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants