Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use original value when comparing with dictionary column in unparser #12610

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

Sevenannn
Copy link
Contributor

Which issue does this PR close?

No relevant issue

Rationale for this change

Datafusion enforces an eager cast of value to dictionary value in the logical plan where a comparison with dictionary value is presented. For example, in the following plan

Filter: person_mood.mood_status = CAST(Utf8("happy") AS Dictionary(Int8, Utf8)) 

However, when using unparser to convert the plan back to sql, where the sql will be sent to various query engine (PostgreSQL, MySQL, etc.) the cast is not needed for the value, in the example above the value happy, since the raw value can be directly used in SQL engines without casting to an engine specific dictionary type. Therefore, the plan can simply be rewritten into

where person_mood.mood_status = 'happy' 

What changes are included in this PR?

  • Directly pass inner_expr instead of cast inner_expr as dictionary in the Expr::Cast case when the inner_expr is a Value representing constant and the plan is cast to dictionary
  • Add a test for the change

Are these changes tested?

Yes

Are there any user-facing changes?

No

@Sevenannn Sevenannn marked this pull request as ready for review September 24, 2024 21:20
@github-actions github-actions bot added the sql SQL Planner label Sep 24, 2024
Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense, both from a readability of the generated SQL point of view, and also for making it work to send this SQL to remote query engines.

I do wonder if there is some edge case that we aren't thinking of where we would need to preserve this cast? 🤔 If so, then I think at that time we would make this a Dialect option and unparse it different based on the dialect. For now I think this is fine!

Thanks @Sevenannn!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Sevenannn and @phillipleblanc

})
match data_type {
DataType::Dictionary(_, _) => match inner_expr {
// Dictionary values don't need to be cast to other types when rewritten back to sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is another example of where logical types as discussed in #11513 might make sense

@alamb alamb merged commit 01fb1a2 into apache:main Sep 25, 2024
24 checks passed
bgjackma pushed a commit to bgjackma/datafusion that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants