Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: associate notification version with epoch #5446

Closed
Tracked by #3750
BugenZhao opened this issue Sep 20, 2022 · 7 comments
Closed
Tracked by #3750

fix: associate notification version with epoch #5446

BugenZhao opened this issue Sep 20, 2022 · 7 comments
Assignees
Labels
component/frontend Protocol, parsing, binder. component/meta Meta related issue. type/enhancement Improvements to existing implementation.

Comments

@BugenZhao
Copy link
Member

BugenZhao commented Sep 20, 2022

The notification mechanism is completely asynchronous for now, as it was initially introduced for catalog updates. However, we've found some cases where we must build an association between the notification version and the system epoch.

Batch query when scaling

We prefer to schedule the scan tasks to the corresponding worker(parallel unit) by vnode mapping of the Materialize executors. After we support the barrier and checkpoint decoupling (#4966), this assumption must be kept. Besides, the in-memory state store e2e (#5322) also tests this.

It's possible that a batch query is scheduled when scaling, where we get an epoch after the scaling, while the vnode mapping from the notification manager has not received updates yet. As the uncheckpointed data is invisible to other compute nodes, we will fail to scan the data from some migrated/rebalanced partitions.

Batch query when dropping

Similar to the above, we may be able to scan the dropped table whose data is being cleaned-up by the storage, due to asynchronous updates of catalog deletion in this frontend. This may cause wrong results or even break some assumptions in batch executors.

Solution

As epoch maintains the sequential consistency of DDLs and configuration changes, we may associate a "minimal notification version" with each epoch. After the frontend pins an epoch for a batch task, it must wait for this version to be synced before scheduling.

cc @liurenjie1024 @yezizp2012 @zwang28 @st1page 🤩

@BugenZhao BugenZhao added type/enhancement Improvements to existing implementation. component/meta Meta related issue. component/frontend Protocol, parsing, binder. labels Sep 20, 2022
@github-actions github-actions bot added this to the release-0.1.13 milestone Sep 20, 2022
@hzxa21
Copy link
Collaborator

hzxa21 commented Sep 20, 2022

Strong +1.

Btw, do we have a case that notification version increases without epoch increases? In other words, can we use epoch as the notification version?

@BugenZhao
Copy link
Member Author

Btw, do we have a case that notification version increases without epoch increases? In other words, can we use epoch as the notification version?

Yes, for those changes not relevant to materialized views, like create/drop source, create/drop database, or even hummock version updates?

@BugenZhao
Copy link
Member Author

I think we must find a way to update & read a snapshot of the combination of Hummock snapshot (epoch) and the vnode mapping atomically. Currently, they're totally independent in both the frontend and the meta committing the barrier. However, any case of mismatch will lead to some problems.

  • Old (current) snapshot, new mapping: we'll try_wait_epoch for a Current epoch on a wrong partition, which will fail on assertion.
    let sealed_epoch = self.local_version.read().get_sealed_epoch();
    assert!(
    epoch <= sealed_epoch
    && epoch != HummockEpoch::MAX
    ,
    "current epoch can't read, because the epoch in storage is not updated, epoch{}, sealed epoch{}"
    ,epoch
    ,sealed_epoch
    );
    return Ok(());
  • New (current) snapshot, old mapping: some data will be invisible remotely, which leads to wrong results.

So I guess some major refactoring might be necessary. 🤔 cc @hzxa21 @yezizp2012 @st1page

@BugenZhao BugenZhao removed their assignment Oct 24, 2022
@fuyufjh
Copy link
Member

fuyufjh commented Nov 8, 2022

Is this closed by #5999?

@BugenZhao
Copy link
Member Author

Is this closed by #5999?

Nope. They're unrelated. I think this might be closed by #6250.

@hzxa21 hzxa21 modified the milestones: release-0.1.14, release-0.1.15 Nov 22, 2022
@fuyufjh fuyufjh changed the title discussion: associate notification version with epoch fix: associate notification version with epoch Nov 24, 2022
mergify bot pushed a commit that referenced this issue Dec 29, 2022
…w fragment_mapping (#7042)

This PR addresses #5446, by ensuring frontend is always notified of new snapshot after corresponding fragment_mapping.

Also a minor refactor that extracts `collect_synced_ssts` from `complete_barrier`, to make the latter cleaner.

This PR doesn't (and no need to) affect relative order between snapshot and catalog notification:
- [snapshot is notified firstly](https://github.com/risingwavelabs/risingwave/blob/14d88919e29b11fe27ff3a0c6d921dacf66c157c/src/meta/src/rpc/service/ddl_service.rs#L453)
- [catalog is notified later](https://github.com/risingwavelabs/risingwave/blob/14d88919e29b11fe27ff3a0c6d921dacf66c157c/src/meta/src/rpc/service/ddl_service.rs#L456)

Approved-By: hzxa21
Approved-By: BugenZhao
@BugenZhao
Copy link
Member Author

Can we close it with #7042? cc @zwang28

@fuyufjh fuyufjh closed this as completed Jan 30, 2023
@zwang28
Copy link
Contributor

zwang28 commented Jan 30, 2023

Can we close it with #7042? cc @zwang28

#7024 addresses the "Batch query when scaling" case.

I'm checking whether the "Batch query when dropping" case is correctly handled. Will open another tracking issue when required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/frontend Protocol, parsing, binder. component/meta Meta related issue. type/enhancement Improvements to existing implementation.
Projects
None yet
Development

No branches or pull requests

5 participants