-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Invalidate metadata children cache after key deleted #20363
[fix][broker] Invalidate metadata children cache after key deleted #20363
Conversation
/pulsarbot rerun-failure-checks |
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
Show resolved
Hide resolved
Besides, I remember that ZK cannot remove recursively at a batch but iterating from children to parent. How the cache won't be invalidated by its children getting deleted? Anyway, invalidate cache eagerly won't introduce correctness issue. |
In the scenario described in #14779 (comment), |
Codecov Report
@@ Coverage Diff @@
## master #20363 +/- ##
=============================================
+ Coverage 36.93% 72.95% +36.01%
- Complexity 12032 31889 +19857
=============================================
Files 1687 1864 +177
Lines 128761 138403 +9642
Branches 14002 15185 +1183
=============================================
+ Hits 47553 100965 +53412
+ Misses 74932 29427 -45505
- Partials 6276 8011 +1735
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Thanks for your explanation! Now I know that no children cause the cache be empty and return empty cursor even if there is some. And this patch is valid even for defensive programming (delete node should always invalid cache). Although, if the cursor is a child of the topic znode, won't I'd appreciate it if you can provide a znode view of execution steps to demonstrate the exception. |
Try me break down into two steps to explain how this error could be produced in #14779:
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Lines 570 to 572 in 1e664b7
pseudo call stack:
BTW, broker will try create a cursor znode which already exists in zk, so it'll receive Here's pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java Lines 244 to 246 in 3f2978d
In summary, this is how I observed the occurrence of BadVersionException. |
After a discussion with @Shawyeok I learned that it's about two broker node that:
|
Merging... Thanks for your contribution and detailed investigation @Shawyeok! |
@Shawyeok Could you add a test for this patch? |
Sure, I'll do it later. |
This patch was aimed to fix #14779, I reproduced in my environment which is based on the However, in theory, there is a chance that the new cursor watcher event is not triggered before |
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
…pache#20363) (cherry picked from commit 64831dc)
Motivation
There is a chance if a topic reassign to a broker that owned this topic once, consumer reconnect will failure with
BadVersionException
, here's detail: #14779 (comment)Reproduce steps:
Modifications
Cleanup metadata children cache if managed-ledger key deleted
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc-not-needed
Matching PR in forked repository
PR in forked repository: Shawyeok#7