-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for EH receiver timeout while opening #21324
Conversation
This pull request is protected by Check Enforcer. What is Check Enforcer?Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass. Why am I getting this message?You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged. What should I do now?If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows: What if I am onboarding a new service?Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment: |
The CBSChannel is a shared resource on the MessagingFactory, used by all senders/receivers opened on the same EventHubClient. If a sender/receiver tries to send a token at the same time that the existing session to cbs$ node is closing, then a race can occur which leaves the MessagingFactory in a bad state. New senders and receivers will timeout while trying to open because the auth step stalls:
a. First step is to send a token via CBSChannel.sendToken, which chains down to FaultTolerantObject.runOnOpenedObject
b. FaultTolerantObject.runOnOpenedObject sees that this.creatingNewInnerObject is true (left over from step 3) and just queues the action, assuming it will be handled when the channel is finally opened, but nobody is opening the channel…
This is similar to a previous race condition which was caused by tracking the same state in two different places and the two getting out of sync, but is not the same. In this case, RequestResponseOpener.isOpened tracks more than just the state of the inner RequestResponseChannel, so we don't want to change to just use the state of the RequestResponseChannel. The proposed fix is for RequestResponseOpener.run to also check the state of the inner RequestResponseChannel; if the state is mixed (isOpened is still true but the RequestResponseChannel is CLOSING or CLOSED) then use a continuation to replay the call to run() when the close callback for the existing channel has finished cleanup and set isOpened back to false.