Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify physical expression creation API (not require schema) #8823

Merged
merged 3 commits into from
Jan 12, 2024

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Jan 10, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

To create physical expression we require 2 sets of schemas:

  • logical
  • physical

Having 2 schemas is confusing in some sense, this PR is to create physical expression based on logical schema only.
The reason the logical schema was chosen is the schema contains more information than physical schema, for instance qualifier information.

What changes are included in this PR?

Signature simplified for bunch of methods creating the physical expressions.

Are these changes tested?

Yes, existing tests

Are there any user-facing changes?

The signature for pub create_physical_expr changed

@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Jan 10, 2024
@comphead comphead added the api change Changes the API exposed to users of the crate label Jan 10, 2024
@comphead comphead requested review from alamb and viirya January 10, 2024 20:57
@alamb alamb changed the title Simplify input schemas for physical expressions Simplify physical expression creation API (not require schema) Jan 11, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That certainly seems like a nice improvement.

Screenshot 2024-01-11 at 3 04 35 PM

Not only is it less code, I agree the API is easier to understand too

datafusion/physical-expr/src/planner.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/src/planner.rs Show resolved Hide resolved
pub fn create_physical_expr(
e: &Expr,
input_dfschema: &DFSchema,
input_schema: &Schema,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why it is added at the beginning. For schema itself, could it be different between input_dfschema and input_schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no, then it is okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be okay, if difference happen by any reason, we usually hit the error on planning Error during planning: Mismatch between schema and batches

Like below
#5695 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it is weird to have duplicate parameters. I guess it might has some reason (i.e., rare cases they might be different).

@liukun4515 Do you remember it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able found an issue a year ago #4766 but later (#5695 (comment)) for similar query we started to see Error during planning: Mismatch between schema and batches. I can assume the check went to planning side.

Another suggestion on why it was done on physical expr side is we probably tried to ensure that schema are the same after all the optimization rules.

@comphead comphead requested a review from viirya January 12, 2024 17:46
@comphead comphead merged commit eb81ea2 into apache:main Jan 12, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants