Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow mqtt client to reconnect if it gets disconnected #977

Merged
merged 1 commit into from
Nov 7, 2024

Conversation

sorccu
Copy link
Collaborator

@sorccu sorccu commented Nov 6, 2024

MQTT client reconnections were disabled for seemingly no reason. Combined with the fact that the SocketClient's various error and disconnection events were completely ignored, a random disconnection before the expected final message was received would almost certainly cause the CLI to essentially hang, doing nothing until the failsafe timeout (currently 10m) hit.

There is an additional edge case which is not covered by this change. Even now that reconnections are enabled, in the unlucky case that the final message is distributed when our client is not connected (i.e. in the process waiting to reconnect, or actively reconnecting), then we'll still miss it and hang the same way. To reduce the odds of this edge condition happening, the reconnect wait timeout was reduced to 100ms from the default of 1000ms.

To invoke the reconnection behavior (and to try to figure out whether there was any reason why reconnections were originally disabled), I wrote a simple HTTP proxy that acts like a normal HTTP proxy but disconnects all connections after a configurable duration. Then I ran the CLI with the https_proxy environment variable and confirmed that reconnections were successful when the proxy closed connections.

I hereby confirm that I followed the code guidelines found at engineering guidelines

Affected Components

  • CLI
  • Create CLI
  • Test
  • Docs
  • Examples
  • Other

Notes for the Reviewer

Resolves #[issue-number]

New Dependency Submission

MQTT client reconnections were disabled for seemingly no reason.
Combined with the fact that the SocketClient's various error and
disconnection events were completely ignored, a random disconnection
before the expected final message was received would almost certainly
cause the CLI to essentially hang, doing nothing until the failsafe
timeout (currently 10m) hit.

There is an additional edge case which is not covered by this
change. Even now that reconnections are enabled, in the unlucky case
that the final message is distributed when our client is not connected
(i.e. in the process waiting to reconnect, or actively reconnecting),
then we'll still miss it and hang the same way. To reduce the odds
of this edge condition happening, the reconnect wait timeout was
reduced to 100ms from the default of 1000ms.
@sorccu sorccu requested a review from clample November 6, 2024 18:15
Copy link
Collaborator

@clample clample left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!

@sorccu sorccu merged commit bb421ba into main Nov 7, 2024
3 checks passed
@sorccu sorccu deleted the fix-socket-reconnection branch November 7, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants