[improve] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths #18518

eolivelli · 2022-11-17T09:21:09Z

This is only a POC, opened for discussion

Explanation

When a operation is redirected to another broker, it may happen that the local view over metadata on the other broker is not up-to-date with the view seen by the initial broker.
In order to guarantee casual consistency among different peers you have to ways using the ZooKeeper APIs:

perform a write
use the sync() API

Both of the two operations force (or wait for) the ZooKeeper server to which the client is connected to be in sync with the Leader ZooKeeper server and also that the local ZooKeeper client is up-to-date.

See more at https://zookeeper.apache.org/doc/r3.8.0/zookeeperProgrammers.html at the "Consistency Guarantees" paragraph.

When the control passes from one broker to another broker the second broker needs to ensure that its local view is not stale.
And we can do it by using the sync() API and then invalidating the local MetadataCache.
The guarantee here is that the operation executed on the initial broker "happened before" the operation on the second broker.

A good example of the need for this feature is described on PR #18193.
at comment https://github.com/apache/pulsar/pull/18193/files#r1005919247

Modifications

Add to MetadataStore the support for using the sync() API and add refreshAndGetAsync() to MetadataCache.
Fix the problem reported by https://github.com/apache/pulsar/pull/18193/files#r1005919247 regarding Partitioned Topic deletion (the same fix should be applied to Namespace deletion, about the Policies#deleted flag)

PR in the forked repository: eolivelli#20

doc
doc-required
doc-not-needed
doc-complete

github-actions · 2022-11-17T12:51:01Z

@eolivelli Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

codecov-commenter · 2022-11-17T13:15:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.21%. Comparing base (b42aed1) to head (c7b7c31).
Report is 1903 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18518      +/-   ##
============================================
+ Coverage     47.11%   47.21%   +0.09%     
- Complexity    10595    10685      +90     
============================================
  Files           710      711       +1     
  Lines         69423    69455      +32     
  Branches       7449     7452       +3     
============================================
+ Hits          32709    32792      +83     
+ Misses        33037    32993      -44     
+ Partials       3677     3670       -7

Flag	Coverage Δ
unittests	`47.21% <ø> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 62 files with indirect coverage changes

merlimat · 2022-11-17T21:00:07Z

Apart from the fact that sync() is a very ZK specific concept, my main problem with this is what is this supposed to achieve and where is this supposed to be used.

Is this to fix the problem of a partitioned topic created and immediately used by a client and seeing error?

The part that is important to highlight is that sync() is essentially forcing a write in the ZK log, in order to guarantee read-after-write in ZK. Imposing a sync each time a client connects, will open a direct write path from clients to ZK.

That would mean that even when using a fully-consistent data store, we would still have the problem that we cannot cache this information.

I would rather find a better way, where we only force the sync on very specific occasions. eg:

try to get partitioned-topic-metadata
It's not in local cache
Try to read from metadata store
We get NotFound error
Force sync() and redo the read
Cache the result of the read, (including the NotFound result)

poorbarcode

LGTM. By the way, should we need a test for this?

poorbarcode · 2022-11-18T05:54:25Z

Hi @merlimat

Is this to fix the problem of a partitioned topic created and immediately used by a client and seeing error?

No, What this PR is trying to fix is that when the metadata attribute deleted of a partitioned topic has changed to true, this value is still cached in other brokers. This PR makes sync and read in other brokers. The first discussion is https://github.com/apache/pulsar/pull/18193/files#r1005921385

I would rather find a better way.

I agree. +1

poorbarcode

Should we need a test for this?

eolivelli · 2022-11-18T07:52:55Z

Thank you @merlimat and @poorbarcode for your comments.

@merlimat this proposal is only to cover very specific cases, as you say we are already doing well for most of the cases.

The case in which you need "sync()" is when there is this combo happening:

you have 2 brokers
broker A mutates the metadata (op1)
another action triggers a "read" (op2) followed by a write (op3) on broker B

In this case when broker B executes op2 we must guarantee that the value that observes includes the effects generated by op1, otherwise op3 may be executed using stale data (any compare-and-set operation in op3 is not on the same znode as op2, so we not protection, no BadVersion).

In the partitioned topic deletion example we have:

op1 -> PartitionedTopicMetadata.deleted = true on Broker A
op2 -> Read (from cache or from the ZK follower node in case of cache miss) PartitionedTopicMetadata on Broker B
op3 -> Do something (like creating again a partition in case of autocreation enabled) on Broker B

You need sync() (on broker B) before op2 because the action that causes op2 is out-of-band in respect to ZooKeeper, and so Broker B may use a stale version of the Z-Node (either because it is connected to a Follower that is not up-to-date or because it has a stale value in the local cache and the watch notification still hasn't arrived, mostly because the Follower is still not up-to-date).

So we have to use this "consistent refreshAndGet" only when this pattern happens.

No need to change Pulsar everywhere, but only on SOME of the Write paths that touch Metadata.

I would really reject any proposal to call sync() before every read as it would kill the existence of the localcache and kill also the performances (as you say, because sync() is a dymmy write)or every write as CAS already covers 99% of the operations

eolivelli · 2022-11-18T07:53:17Z

@poorbarcode I will add tests only when we reach consensus on the approach.

eolivelli · 2022-11-22T08:16:54Z

@merlimat do you have more comments please ?

github-actions · 2022-12-23T01:58:15Z

The pr had no activity for 30 days, mark with Stale label.

eolivelli · 2022-12-23T14:07:35Z

@merlimat @codelipenghui @hangc0276 @rdhabalia I believe that this PR adds value and that we need to complete this work.

I can't find any other way to achieve the goals of this PR

…n critical metadata operation paths

poorbarcode · 2023-08-28T07:01:55Z

@eolivelli I want to cherry-pick this PR to 2.10 and 2.11 to solve the issue, which #21063 fixed. Do you feel okay?

poorbarcode · 2023-08-30T02:00:58Z

@eolivelli I want to cherry-pick this PR to 2.10 and 2.11 to solve the issue, which #21063 fixed. Do you feel okay?

I sent a discuss to cherry-pick this PR to branch-2.10 and branch-2.11

https://lists.apache.org/thread/g6b1xh3tr8qj0vdd0jwtkhswlsr50631

…ing critical metadata operation paths (#18518) (cherry picked from commit 492a9c3)

poorbarcode · 2023-08-31T19:02:44Z

I sent a discuss to cherry-pick this PR to branch-2.10 and branch-2.11

https://lists.apache.org/thread/g6b1xh3tr8qj0vdd0jwtkhswlsr50631

Cherry-picked.

Since the method isPartitionedTopicBeingDeletedAsync only exists in branch-3.0, the change https://github.com/apache/pulsar/pull/18518/files#diff-38fac527320b93f0e9ea2459420f56638299627f7f29c01484f7e4aae0c5f0b7R330 was not be cherry-picked into branch-2.10 and branch-2.11

…ing critical metadata operation paths (#18518) (cherry picked from commit 492a9c3)

eolivelli requested review from merlimat, lhotari and dlg99 November 17, 2022 09:21

github-actions bot added the doc-label-missing label Nov 17, 2022

eolivelli mentioned this pull request Nov 17, 2022

[fix] Combination of autocreate + forced delete of partitioned topic with active consumer leaves topic metadata inconsistent. #18193

Merged

4 tasks

eolivelli self-assigned this Nov 17, 2022

eolivelli added doc-not-needed Your PR changes do not impact docs ready-to-test and removed doc-label-missing labels Nov 17, 2022

eolivelli added this to the 2.12.0 milestone Nov 17, 2022

github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Nov 17, 2022

eolivelli changed the title ~~[proof-of-concept] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths~~ [improvement] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths Nov 17, 2022

eolivelli changed the title ~~[improvement] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths~~ [improve] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths Nov 17, 2022

eolivelli marked this pull request as ready for review November 17, 2022 12:49

apache deleted a comment from github-actions bot Nov 17, 2022

eolivelli added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Nov 17, 2022

github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Nov 17, 2022

eolivelli mentioned this pull request Nov 17, 2022

[run-tests] [improvement] Introduce the sync( API to ensure consistency on reads during critical metadata operation paths eolivelli/pulsar#20

Open

github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Nov 17, 2022

poorbarcode approved these changes Nov 18, 2022

View reviewed changes

poorbarcode reviewed Nov 18, 2022

View reviewed changes

github-actions bot added the Stale label Dec 23, 2022

github-actions bot removed the Stale label Dec 24, 2022

[proof-of-concept] Introduce the sync() API to guaratee consistency o…

c7b7c31

…n critical metadata operation paths

eolivelli force-pushed the impl/sync-poc branch from 30d7d3a to c7b7c31 Compare December 27, 2022 08:18

codelipenghui approved these changes Dec 27, 2022

View reviewed changes

eolivelli merged commit 492a9c3 into apache:master Dec 27, 2022

This was referenced Dec 29, 2022

[improve] Refresh ns policy when deciding auto topic creation eligibility #19097

Closed

[improve] Refresh ns policy when deciding auto topic creation eligibility aymkhalil/pulsar#5

Open

lhotari requested review from hangc0276 and Jason918 February 3, 2023 17:04

poorbarcode pushed a commit that referenced this pull request Aug 31, 2023

[improve] Introduce the sync() API to ensure consistency on reads dur…

29bb685

…ing critical metadata operation paths (#18518) (cherry picked from commit 492a9c3)

poorbarcode added cherry-picked/branch-2.10 release/2.10.6 labels Aug 31, 2023

poorbarcode pushed a commit that referenced this pull request Aug 31, 2023

[improve] Introduce the sync() API to ensure consistency on reads dur…

5d3099d

…ing critical metadata operation paths (#18518) (cherry picked from commit 492a9c3)

poorbarcode added cherry-picked/branch-2.11 release/2.11.3 labels Aug 31, 2023

lhotari mentioned this pull request Apr 18, 2024

Topic creation not working properly #22527

Closed

3 tasks

lhotari mentioned this pull request Aug 17, 2024

[Bug] Inconsistent Metadata Read #23186

Closed

3 tasks

lhotari mentioned this pull request Oct 14, 2024

[Bug] Brokers spuriously error in response to the PRODUCER command for newly added partitions to a topic #23451

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths #18518

[improve] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths #18518

eolivelli commented Nov 17, 2022 •

edited

Loading

github-actions bot commented Nov 17, 2022

codecov-commenter commented Nov 17, 2022 •

edited

Loading

merlimat commented Nov 17, 2022 •

edited

Loading

poorbarcode left a comment

poorbarcode commented Nov 18, 2022 •

edited

Loading

poorbarcode left a comment

eolivelli commented Nov 18, 2022

eolivelli commented Nov 18, 2022

eolivelli commented Nov 22, 2022

github-actions bot commented Dec 23, 2022

eolivelli commented Dec 23, 2022

poorbarcode commented Aug 28, 2023

poorbarcode commented Aug 30, 2023

poorbarcode commented Aug 31, 2023

[improve] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths #18518

[improve] Introduce the sync() API to ensure consistency on reads during critical metadata operation paths #18518

Conversation

eolivelli commented Nov 17, 2022 • edited Loading

Explanation

Modifications

github-actions bot commented Nov 17, 2022

codecov-commenter commented Nov 17, 2022 • edited Loading

Codecov Report

merlimat commented Nov 17, 2022 • edited Loading

poorbarcode left a comment

Choose a reason for hiding this comment

poorbarcode commented Nov 18, 2022 • edited Loading

poorbarcode left a comment

Choose a reason for hiding this comment

eolivelli commented Nov 18, 2022

eolivelli commented Nov 18, 2022

eolivelli commented Nov 22, 2022

github-actions bot commented Dec 23, 2022

eolivelli commented Dec 23, 2022

poorbarcode commented Aug 28, 2023

poorbarcode commented Aug 30, 2023

poorbarcode commented Aug 31, 2023

eolivelli commented Nov 17, 2022 •

edited

Loading

codecov-commenter commented Nov 17, 2022 •

edited

Loading

merlimat commented Nov 17, 2022 •

edited

Loading

poorbarcode commented Nov 18, 2022 •

edited

Loading