Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Possible fix to one cause of the "connection deadlock" #1580

Merged
merged 1 commit into from
Jul 10, 2020

Conversation

SubZero0
Copy link
Member

Firsts things first: since this issue isn't easily reproducible, I based this fix on a stacktrace provided in one issue, in this case, #1572 . So there's a few things to consider:

  1. It might not fix all possible causes to the connection deadlock;
  2. It might have an unintended side effect that I didn't notice/predict (I'll explain at the end).

The issue

I read the stacktrace from #1572 again and again and came to the conclusion that what happens is due to an exception thrown from a rest request being cancelled and reaching the ConnectionManager that will permanently disconnect due to it.

It goes as this:

  1. DiscordSocketClient.StartAsync
  2. ConnectionManager.StartAsync
  3. ConnectionManager: await ConnectAsync(reconnectCancelToken).ConfigureAwait(false);
  4. ConnectionManager: await _onConnecting().ConfigureAwait(false); (that is actually DiscordSocketClient.OnConnectingAsync)
  5. DiscordSocketClient: await ApiClient.ConnectAsync().ConfigureAwait(false);
  6. DiscordSocketApiClient: ConnectAsync
  7. DiscordSocketApiClient: ConnectInternalAsync
  8. ConnectInternalAsync: var gatewayResponse = await GetGatewayAsync().ConfigureAwait(false); fails and throws a TaskCanceledException (could be any other rest request in DiscordSocketClient.OnConnectingAsync)
  9. TaskCanceledException bubbles up until it's back to the ConnectionManager.StartAsync
  10. StartAsync will finally deal with the exception, since TaskCanceledException inherits from OperationCanceledException, it'll end in this catch:
catch (OperationCanceledException ex)
{
    Cancel(); //In case this exception didn't come from another Error call
    await DisconnectAsync(ex, !reconnectCancelToken.IsCancellationRequested).ConfigureAwait(false);
}

This catch will cancel all tokens in Cancel(), including the reconnect, so it'll just Disconnect and stop there.

Solution

I decided to try-catch the await _onConnecting().ConfigureAwait(false); to prevent any TaskCanceledException from bubbling up (and disconnecting the client permanently) and sent the inner exception instead (added a null check to prevent throwing "null", but I didn't also see a reason why that would happen, so it's there just to be safe).

Now this could cause an unintended side effect. I wasn't able to identify a method would legitly throw that specific exception, so I don't believe it'll happen but it could if I missed it, and end not disconnecting permanently the client as expected.

If anyone that has this issue recurrently could test this PR, it would be highly appreaciated, same goes if you see something I missed that could potentially cause an unintended consequence.

@SubZero0 SubZero0 changed the title fix: Possible fix to one of the "connection deadlock" fix: Possible fix to one cause of the "connection deadlock" Jun 24, 2020
@xXBuilderBXx
Copy link
Contributor

I have been getting some weird disconnection/timeout issues from the lib so i'm gonna test this on my big bot and see how well it handles will let you know if there is any issues.

@foxbot foxbot merged commit b8fa464 into discord-net:dev Jul 10, 2020
@SubZero0 SubZero0 deleted the possible-connection-deadlock-fix branch July 10, 2020 04:14
@AraHaan
Copy link

AraHaan commented Jul 14, 2020

Nice, I have also identified another issue as well, but good work, this option is also reproduceable in the log event.

  1. Like for example if you use the Log event to AppendAllTextAsync (from System.IO.File) and you do not handle the case where the file is in use (apperently from opening it too fast every line it tries to log making it throw an System.IO.IOException)
  2. Calling an long blocking operation on 1 of the user side events.

I think letting bot devs know that this can happen on the Log, Disconnected, Connected, and Ready events and for whatever means avoid any possible exceptions and long blocking operations it might eliminate it entirely as I recently did. However I am still using an DiscordSocketClientFactory that creates a new DiscordSocketClient every invoke of the Disconnected event and the code also disposes of the old one (and tries to connect with the new instance 5 seconds later that is multiplied by the number of reconnections and capped at 60 seconds) (similar to what the lib does somewhere) just to stay safe from ratelimits.

@SubZero0
Copy link
Member Author

@AraHaan This is unrelated to this PR so it shouldn't have been posted here but to answer you:

Throwing exceptions shouldn't cause an issue since it's try-catch'd (but imo you should deal with exception and not bubble them up in the event handler), you can see how it's done at DiscordSocketClient.cs#L475.

Blocking any event could kill your connection, this is already stated in our docs, see https://docs.stillu.cc/guides/concepts/events.html#safety.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants