-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add small column on empty projection #7833
Add small column on empty projection #7833
Conversation
…all-column-on-empty-projection
// Get the projection exprs from columns in the order of the schema | ||
/// Accumulate the memory size of a data type measured in bits. | ||
/// | ||
/// Nested types are traversed and increment `nesting` on every level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment saying that variable-sized types are estimated using some heuristics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Added a comment about variable sized types. Feel free to rephrase if you think something is missing.
LargeList(f) => nested_size(f.data_type(), nesting), | ||
Struct(fields) => fields | ||
.iter() | ||
.map(|f| nested_size(f.data_type(), nesting)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we could project a sub-field from a struct instead of the entire struct (all columns).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will play around with it. Though it sounds like a rare edge case to me where no other "smaller" type would be present in the schema!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah indeed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change seems non controversial and has some good tests, so merging seems fine. Thank you @ch-sc 😊 |
Which issue does this PR close?
Improves #3214.
Rationale for this change
If a projection is empty, we add the first column of the input schema since some parts of DataFusion still rely on at least having one column. Instead of selecting the first column from the input schema, these changes aim to select a column with a smaller memory size. The memory size is based on the data type.
What changes are included in this PR?
Are these changes tested?
Basic unit tests for new logic are included. All tests that involve query planning and empty projections execute this code.
Are there any user-facing changes?