Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reconnects when ping timers under/overshoot #289

Merged
merged 1 commit into from
Jul 27, 2019

Conversation

georgyangelov
Copy link
Contributor

@georgyangelov georgyangelov commented Jul 26, 2019

The ping timer is set exactly on the ping interval. Sometimes the timer runs a bit before the ping time so the ping misses. If the next timer overshoots, the ping is not sent and the connection is restarted
(because the last message time is > ping_time*2).

Scheduling the timer more frequently than the actual ping time fixes these issues as the timer is guaranteed to ping at least once during the ping_time*2 interval.

Here is an example of what happens:

D, [2019-07-26T16:49:15.839429 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#keep_alive?: Websocket ping time is 30
D, [2019-07-26T16:49:15.839698 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#keep_alive?: Time since last message: 37.04522261582315
D, [2019-07-26T16:49:15.839805 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#send_json: {:type=>"ping", :id=>2}
D, [2019-07-26T16:49:15.839994 #7] DEBUG -- Slack::RealTime::Concurrency::Async::Socket#send_data: {"type":"ping","id":2}
D, [2019-07-26T16:49:15.873663 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#run_loop: WebSocket::Driver::MessageEvent, {"type":"pong","reply_to":2}
D, [2019-07-26T16:49:15.874079 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#dispatch: reply_to=2, type=pong


D, [2019-07-26T16:49:45.865965 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#keep_alive?: Websocket ping time is 30
D, [2019-07-26T16:49:45.867134 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#keep_alive?: Time since last message: 29.992787840776145


D, [2019-07-26T16:50:15.895730 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#keep_alive?: Websocket ping time is 30
D, [2019-07-26T16:50:15.896235 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#keep_alive?: Time since last message: 60.022549238055944
W, [2019-07-26T16:50:15.896895 #7]  WARN -- id=TEAMID, name=TEAMNAME, domain=team: is offline
D, [2019-07-26T16:50:15.897195 #7] DEBUG -- : id=TEAMID, name=TEAMNAME, domain=team#restart_async
D, [2019-07-26T16:50:15.897333 #7] DEBUG -- id=TEAMID, name=TEAMNAME, domain=team#run_loop: NilClass
I, [2019-07-26T16:50:15.900166 #7]  INFO -- request: POST https://slack.com/api/rtm.start

Please note that I haven't tested the Celluloid or EventMachine implementations, just the Async one.

@georgyangelov georgyangelov force-pushed the fix-ping-timeout branch 2 times, most recently from cd7af9e to 9728f4e Compare July 26, 2019 17:27
@dangerpr-bot
Copy link

dangerpr-bot commented Jul 26, 2019

1 Warning
⚠️ There’re library changes, but not tests. That’s OK as long as you’re refactoring existing code.

Generated by 🚫 danger

@dblock
Copy link
Collaborator

dblock commented Jul 26, 2019

This is good! It explains why reconnects are so frequent at scale despite the connection being actually alive (I was trying to debug that.)

  • Let's extract the method that does / 2. Will be easier to expose it as an option as well, if you feel motivated I think it's a good idea.
  • Document what's going on in README.

@georgyangelov
Copy link
Contributor Author

georgyangelov commented Jul 26, 2019

Thanks for the feedback!

  • Where do you think would be good to put the method? In the configuration near the ping interval option maybe?
  • I'll add something to the docs tomorrow.

@dblock
Copy link
Collaborator

dblock commented Jul 26, 2019

Config near ping option interval sounds right.

The ping timer is set exactly on the interval. Sometimes the timer runs
a bit before the ping time so the ping misses. If the next timer
overshoots, the ping is not sent and the connection is restarted
(because the last message time is > ping_time*2). Scheduling the timer
more frequently than the actual ping time fixes these issues as the
timer is guaranteed to ping at least once during the ping_time*2
interval.
@georgyangelov
Copy link
Contributor Author

@dblock Can you take another look? I wasn't sure how much detail you wanted me to include in the readme, so I provided some coarse details and linked this PR.

@dblock dblock merged commit 1465682 into slack-ruby:master Jul 27, 2019
@dblock
Copy link
Collaborator

dblock commented Jul 27, 2019

Merged. I am going to cut a release with this.

@dblock
Copy link
Collaborator

dblock commented Jul 27, 2019

This actually broke the integration tests with a slack token. Attempt at a fix in 5c61124.

Locally run export SLACK_API_TOKEN=... CONCURRENCY=async-websocket and rspec spec/integration/integration_spec.rb for next time. Unfortunately can't leak a token for PRs to pick it up, so it's easy to miss.

@georgyangelov
Copy link
Contributor Author

Oops, sorry, I'll take care of this next time. Thanks for fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants