Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(batch): Batch scheduler panics if one side of the join has singleton exchange and none for the other side. #7115

Closed
chenzl25 opened this issue Dec 29, 2022 · 3 comments · Fixed by #7240
Assignees
Labels
component/batch Batch related related issue. help wanted Issues that need help from contributors priority/high type/bug Something isn't working

Comments

@chenzl25
Copy link
Contributor

chenzl25 commented Dec 29, 2022

Describe the bug

dev=> create table t (id int);
create materialized view v as select count(*) cnt from t;
CREATE_TABLE
CREATE_MATERIALIZED_VIEW
dev=> explain select * from v join (select count(*) from t) T;
                      QUERY PLAN
------------------------------------------------------
 BatchNestedLoopJoin { type: Inner, predicate: true }
 ├─BatchScan { table: v, columns: [cnt] }
 └─BatchSimpleAgg { aggs: [count] }
   └─BatchExchange { order: [], dist: Single }
     └─BatchScan { table: t, columns: [] }
(5 rows)


// panic
dev=> select * from v join (select count(*) from t) T;

As you can see, mv v has Singleton distribution, so its required exchange/shuffle is removed. Currently our scheduler will panic on either local or distributed execution mode.

frontend.log (local execution mode)

...
thread 'risingwave-main' panicked at 'not implemented: not supported in local mode', src/frontend/src/scheduler/task_context.rs:62:9
stack backtrace:
Wed Jan  4 12:51:21 UTC 2023 [risedev]: Program exited with 139

frontend.log (distributed execution mode)

...
thread 'risingwave-main' panicked at 'no partition info for seq scan', src/frontend/src/scheduler/distributed/stage.rs:791:22

To Reproduce

create table t (id int);
create materialized view v as select count(*) cnt from t;
-- panic
select * from v join (select count(*) from t) T;

Expected behavior

No response

Additional context

No response

@chenzl25 chenzl25 added type/bug Something isn't working help wanted Issues that need help from contributors priority/high component/batch Batch related related issue. labels Dec 29, 2022
@fuyufjh fuyufjh added this to the release-0.1.16 milestone Dec 29, 2022
@kwannoel kwannoel self-assigned this Jan 4, 2023
@kwannoel
Copy link
Contributor

kwannoel commented Jan 4, 2023

Looking into to it, but appreciate if anyone has ideas to share, haven't worked on scheduler before. (Feel free to re-assign to someone else, maybe I will take some time to fix as I am unfamiliar...).

@liurenjie1024 liurenjie1024 changed the title Batch scheduler panics if one side of the join has singleton exchange and none for the other side. bug(batch): Batch scheduler panics if one side of the join has singleton exchange and none for the other side. Jan 5, 2023
@liurenjie1024
Copy link
Contributor

liurenjie1024 commented Jan 5, 2023

  1. For local execution mode, the physical plan is incorrect. We are supposed to insert an exchange before all table scan, and table scan should be pushed to compute node. The panic for local execution mode is caused by trying to execute table scan in frontend.
  2. For distributed execution mode, we need to investigate why no partition found for table.

cc @kwannoel

@kwannoel
Copy link
Contributor

kwannoel commented Jan 10, 2023

For distributed execution mode, we need to investigate why no partition found for table.

In distributed execution mode table scan is also in root fragment, hence the error:

dev=> create table t (id int);
CREATE_TABLE
dev=> create materialized view v as select count(*) cnt from t;
CREATE_MATERIALIZED_VIEW
dev=> SET QUERY_MODE TO distributed;
SET_OPTION
dev=> explain select * from v join (select count(*) from t) T;
                      QUERY PLAN
------------------------------------------------------
 BatchNestedLoopJoin { type: Inner, predicate: true }
 ├─BatchScan { table: v, columns: [cnt] }
 └─BatchSimpleAgg { aggs: [sum0(count)] }
   └─BatchExchange { order: [], dist: Single }
     └─BatchSimpleAgg { aggs: [count] }
       └─BatchScan { table: t, columns: [] }
(6 rows)

I guess an exchange should be inserted either before or after NestedLoopJoin.

  • I think the one way to fix it would be to fix require_additional_exchange_on_root, since currently it does not check if operators with multiple scan nodes all have exchange between them and root. It should do as its documentation says, and ensure singleton table scan is not in the root stage:
    /// As we always run the root stage locally, we should ensure that singleton table scan is not
    /// the root stage. Returns `true` if we must insert an additional exchange to ensure this.

Will implement this in PR (#7240) first, but open to other ideas.

@mergify mergify bot closed this as completed in #7240 Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/batch Batch related related issue. help wanted Issues that need help from contributors priority/high type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants