Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate streaming e2e join to rust frontend #1283

Closed
BowenXiao1999 opened this issue Mar 25, 2022 · 9 comments
Closed

Migrate streaming e2e join to rust frontend #1283

BowenXiao1999 opened this issue Mar 25, 2022 · 9 comments

Comments

@BowenXiao1999
Copy link
Contributor

No description provided.

@BowenXiao1999
Copy link
Contributor Author

BowenXiao1999 commented Mar 25, 2022

https://github.com/singularity-data/risingwave/blob/477c52b10e891a95ab634301f0c13a2adf140d9a/rust/frontend/src/catalog/table_catalog.rs#L72-L79

See the code in #1291
When I try to run join on e2e, It will panic with found duplicate column name _row_id. When I delete column name related code, It will panic again because column id are overwritten (Table A is 0, 1, 2, Table B is also 0, 1, 2, so finally we only got 3 + 1 columns instead of 6 + 1).

This is very likely that we do not handle correctly when Materialize create new table/MV (e.g. allocate new column id etc).

You can reproduce at origin/bw/add-streaming-hash-join-e2e.

cc @skyzh @BugenZhao @st1page @MrCroxx

@BowenXiao1999
Copy link
Contributor Author

@st1page
Copy link
Contributor

st1page commented Mar 25, 2022

A production question, if we have table t1(a int, b int) and table t2(a int, b int), then if we do a sql create mv mv1 as select * from t1 join t2 on t1.a > t2.a AND t1.b EQ t2.b. then the mv should have columns(a,a,b,b), which with same column name. if it is ok?

@BowenXiao1999
Copy link
Contributor Author

A production question, if we have table t1(a int, b int) and table t2(a int, b int), then if we do a sql create mv mv1 as select * from t1 join t2 on t1.a > t2.a AND t1.b EQ t2.b. then the mv should have columns(a,a,b,b), which with same column name. if it is ok?

Good question, i think we should differentiate them. Otherwise we can not tell diff on t1.a and t2.a.

@xiangjinwu
Copy link
Contributor

Just tried on pg about duplicate output name:

test=# select * from t t1, t t2;
 v1 | v2 | v1 | v2 
----+----+----+----
  1 | 10 |  1 | 10
  1 | 10 |  2 | 20
  2 | 20 |  1 | 10
  2 | 20 |  2 | 20
(4 rows)

test=# create materialized view mv1 as select * from t t1, t t2;
ERROR:  column "v1" specified more than once

Basically, it should be accepted as a batch query but rejected as stream.

@BowenXiao1999
Copy link
Contributor Author

BowenXiao1999 commented Mar 25, 2022

Just tried on pg about duplicate output name:

test=# select * from t t1, t t2;
 v1 | v2 | v1 | v2 
----+----+----+----
  1 | 10 |  1 | 10
  1 | 10 |  2 | 20
  2 | 20 |  1 | 10
  2 | 20 |  2 | 20
(4 rows)

test=# create materialized view mv1 as select * from t t1, t t2;
ERROR:  column "v1" specified more than once

Basically, it should be accepted as a batch query but rejected as stream.

But the question is, java already support this: https://github.com/singularity-data/risingwave/blob/main/e2e_test/streaming/join.slt

@st1page
Copy link
Contributor

st1page commented Mar 25, 2022

Just tried on pg about duplicate output name:

test=# select * from t t1, t t2;
 v1 | v2 | v1 | v2 
----+----+----+----
  1 | 10 |  1 | 10
  1 | 10 |  2 | 20
  2 | 20 |  1 | 10
  2 | 20 |  2 | 20
(4 rows)

test=# create materialized view mv1 as select * from t t1, t t2;
ERROR:  column "v1" specified more than once

Basically, it should be accepted as a batch query but rejected as stream.

But the question is, java already support this: https://github.com/singularity-data/risingwave/blob/main/e2e_test/streaming/join.slt

I prefer to give a error if user create a MV and its columns can be mixed up with name... And except the user column, our ROW_ID column name could be same in the MV. we have 2 choice.

  • add source/table/MV name in the ROW_ID column name
  • break the limit in table catalog that every column has unique column name
    Need more discussion next week.

@st1page
Copy link
Contributor

st1page commented Mar 28, 2022

@BowenXiao1999 I think the column id issue was fixed by #1314

@BowenXiao1999
Copy link
Contributor Author

Followed on #1331 . Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants