Add small column on empty projection #7833

ch-sc · 2023-10-16T12:11:13Z

Which issue does this PR close?

Improves #3214.

Rationale for this change

If a projection is empty, we add the first column of the input schema since some parts of DataFusion still rely on at least having one column. Instead of selecting the first column from the input schema, these changes aim to select a column with a smaller memory size. The memory size is based on the data type.

What changes are included in this PR?

Are these changes tested?

Basic unit tests for new logic are included. All tests that involve query planning and empty projections execute this code.

Are there any user-facing changes?

…all-column-on-empty-projection

datafusion/optimizer/src/push_down_projection.rs

datafusion/sqllogictest/test_files/avro.slt

Dandandan · 2023-10-17T09:21:12Z

datafusion/optimizer/src/push_down_projection.rs

-// Get the projection exprs from columns in the order of the schema
+/// Accumulate the memory size of a data type measured in bits.
+///
+/// Nested types are traversed and increment `nesting` on every level.


Can we add a comment saying that variable-sized types are estimated using some heuristics?

Makes sense. Added a comment about variable sized types. Feel free to rephrase if you think something is missing.

Dandandan · 2023-10-17T09:22:43Z

datafusion/optimizer/src/push_down_projection.rs

+        LargeList(f) => nested_size(f.data_type(), nesting),
+        Struct(fields) => fields
+            .iter()
+            .map(|f| nested_size(f.data_type(), nesting))


In principle we could project a sub-field from a struct instead of the entire struct (all columns).

Good idea, I will play around with it. Though it sounds like a rare edge case to me where no other "smaller" type would be present in the schema!?

Yeah indeed :)

Dandandan

awesome @ch-sc ! I left a few comments.

This will yield some nice performance improvements for SELECT COUNT(*) from [source] queries even without solving #3214

Dandandan · 2023-10-18T11:06:46Z

Change seems non controversial and has some good tests, so merging seems fine.

Thank you @ch-sc 😊

ch-sc added 2 commits October 16, 2023 13:45

Find small column when projection is empty

c196ba2

Merge branch 'main' of github.com:apache/arrow-datafusion into add-sm…

08d1558

…all-column-on-empty-projection

github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 16, 2023

ch-sc added 3 commits October 16, 2023 14:26

clippy

9cb3241

fix comment

0807354

fix avro.slt test

05d2179

Dandandan reviewed Oct 17, 2023

View reviewed changes

datafusion/optimizer/src/push_down_projection.rs Outdated Show resolved Hide resolved

Dandandan reviewed Oct 17, 2023

View reviewed changes

datafusion/optimizer/src/push_down_projection.rs Outdated Show resolved Hide resolved

Dandandan reviewed Oct 17, 2023

View reviewed changes

datafusion/sqllogictest/test_files/avro.slt Show resolved Hide resolved

Dandandan reviewed Oct 17, 2023

View reviewed changes

Dandandan approved these changes Oct 17, 2023

View reviewed changes

ch-sc added 2 commits October 18, 2023 11:16

use min_by

f648cce

clippy

cf77e80

Dandandan approved these changes Oct 18, 2023

View reviewed changes

Dandandan merged commit 7acd883 into apache:main Oct 18, 2023
22 checks passed

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add small column on empty projection #7833

Add small column on empty projection #7833

ch-sc commented Oct 16, 2023

Dandandan Oct 17, 2023

ch-sc Oct 18, 2023

Dandandan Oct 17, 2023

ch-sc Oct 18, 2023 •

edited

Loading

Dandandan Oct 18, 2023

Dandandan left a comment

Dandandan commented Oct 18, 2023

Add small column on empty projection #7833

Add small column on empty projection #7833

Conversation

ch-sc commented Oct 16, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Dandandan Oct 17, 2023

Choose a reason for hiding this comment

ch-sc Oct 18, 2023

Choose a reason for hiding this comment

Dandandan Oct 17, 2023

Choose a reason for hiding this comment

ch-sc Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Dandandan Oct 18, 2023

Choose a reason for hiding this comment

Dandandan left a comment

Choose a reason for hiding this comment

Dandandan commented Oct 18, 2023

ch-sc Oct 18, 2023 •

edited

Loading