-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Deadlock in broker service while initializing bkClient #22699
Comments
Thanks for the issue report. Is this similar to #20148 which is fixed by #21096 ? In a Slack thread I made these comments some time ago:
|
@Meet0861 I completed cherry-picking #22853 to branch-3.0 . Pulsar 3.0.6 release is planned to happen after there's Bookkeeper 4.16.6 available, possibly in the upcoming few weeks. |
Search before asking
Read release policy
Version
2.10.6
Minimal reproduce step
Not able to reproduce. But its happenning in our running clusters intermittently(mostly observed after rollouts) after upgrading from 2.9.3 to 2.10.6
What did you expect to see?
Exception can be thrown with valid reason if any and thread can be released
What did you see instead?
Threads gets blocked and timeouts in produce/consume. Also, faulty broker stopped serving anything and all the bundles unloaded to some other broker.
Exception at Client side:
`WARN 8 --- [-client-io-18-4] o.a.p.client.impl.ConnectionHandler : [persistent://tenant/namespace/topic-partition-34] [tenant/namespace] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: Connection already closed
2024-04-22T10:29:31.898+05:30 WARN 8 --- [-client-io-18-4] o.a.p.client.impl.ConnectionHandler : [persistent://tenant/namespace/topic-partition-34] [tenant/namespace] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: Connection already closed -- Will try again in 57.264 s`
Anything else?
We have analysed the thread dumps and found a possible deadlock situation.
[thread dump]
Here, we can see thread metadata-store-10-1 is waiting for 2098 and 2098 is held by pulsar-io-4-7. Pulsar-io-4-7 is not releasing this 2098 as its waiting for d898. Now, what is d898 is stuck at?
D898 is stuck at BookieRackAffinityMapping.setConf() and waiting for completable future.
Can this be related to #20944 ??
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: