Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): change create mv field name #2662

Merged
merged 9 commits into from
May 19, 2022

Conversation

st1page
Copy link
Contributor

@st1page st1page commented May 19, 2022

What's changed and what's your intention?

  • remove the row id number
  • if a create materialized view generate multiple same name of a hidden column, it will add a number.

#1363 not really solve the problem, except row id, all columns generated in pk deriving could be not unique.
you can see src/frontend/test_runner/tests/testdata/pk_derive.yaml for detailed case

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@st1page st1page requested a review from fuyufjh May 19, 2022 06:37
@xiangjinwu
Copy link
Contributor

Seems related to #2467, where a unique field name is preferred, and the duplicates originated from different PlanNode.

  • In Experssions in GROUP BY have wrong expression index #2467, the 2 expr#0s are from 2 different LogicalProjects, one before LogicalAgg and one after LogicalAgg. I tried to dedup them by prepending PlanNodeId but failed to find an elegant way.
  • The difference is, Experssions in GROUP BY have wrong expression index #2467 only affects field names for debugging. This PR is concerned about column names in catalog, although they are hidden and not selectable by user.
  • As for tumble window, they already have a user-visible name in bind context, which will be propagated to PlanRoot and then mview column name. But it is not available as schema field name on intermediate PlanNodes.

@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

Merging #2662 (635d1a9) into main (f2c003f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2662   +/-   ##
=======================================
  Coverage   72.19%   72.19%           
=======================================
  Files         681      681           
  Lines       88477    88481    +4     
=======================================
+ Hits        63872    63875    +3     
- Misses      24605    24606    +1     
Flag Coverage Δ
rust 72.19% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/catalog/mod.rs 82.14% <100.00%> (ø)
src/frontend/src/handler/create_index.rs 86.92% <100.00%> (+0.20%) ⬆️
src/frontend/src/handler/create_mv.rs 98.56% <100.00%> (ø)
src/frontend/src/handler/create_source.rs 96.94% <100.00%> (ø)
src/frontend/src/handler/create_table.rs 96.40% <100.00%> (ø)
...c/frontend/src/optimizer/plan_node/logical_scan.rs 100.00% <100.00%> (ø)
...tend/src/optimizer/plan_node/stream_materialize.rs 91.19% <100.00%> (-0.60%) ⬇️
src/utils/pgwire/src/pg_protocol.rs 0.00% <0.00%> (ø)
src/risedevtool/src/config_gen/grafana_gen.rs 0.00% <0.00%> (ø)
... and 1 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@st1page
Copy link
Contributor Author

st1page commented May 19, 2022

so we have two kinds of field names

  • user-visible field names, which are declared in the select clause.
  • internal-schema field names, which are for debugging and just used in our system

but in create materialized view statement we use both of them. and we should ensure all column name in materialized view is unique

  • for user-visible column part we can force user to use alias and make sure they are unique.
  • for the hidden columns we just use the internal-schema field names, but in materialized operator gives the materialized view columns a unique name by give number. And we will not guarantee these columns name. for example a user give create materialized view mv1 as select max(v) from t group by id; and later select id from mv1, we might give an error that mv1 does not have column id.

Comment on lines -135 to +124
if !c.is_hidden {
let name = out_name_iter.next().unwrap();
c.column_desc.name = name;
}
c.column_desc.name = if !c.is_hidden {
out_name_iter.next().unwrap()
} else {
match col_names.try_insert(field.name.clone(), 0) {
Ok(_) => field.name.clone(),
Err(mut err) => {
let cnt = err.entry.get_mut();
*cnt += 1;
field.name.clone() + "#" + &cnt.to_string()
}
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core change is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants