Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings #46661

Closed
wants to merge 5 commits into from

Conversation

uros-db
Copy link
Contributor

@uros-db uros-db commented May 20, 2024

What changes were proposed in this pull request?

Following up on the introduction of AbstractMapType (#46458) and changes that introduce collation awareness for RaiseError expression (#46461), this PR should add the appropriate type casting rules for AbstractMapType.

Why are the changes needed?

Fix the CI failure for the Support RaiseError misc expression with collation test when ANSI is off.

Does this PR introduce any user-facing change?

Yes, type casting is now allowed for map types with collated strings.

How was this patch tested?

Extended suite CollationSQLExpressionsANSIOffSuite with ANSI disabled.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 20, 2024
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please fill the PR description

@uros-db uros-db changed the title [WIP][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings [SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings May 21, 2024
Copy link
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated PR description, good to go

@@ -85,6 +85,11 @@ object CollationTypeCasts extends TypeCoercionRule {
private def extractStringType(dt: DataType): StringType = dt match {
case st: StringType => st
case ArrayType(et, _) => extractStringType(et)
case MapType(kt, vt, _) => if (hasStringType(kt)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes the key value should be the same string type. I think it's fine, but we can revisit this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this might be a bit confusing - we'll keep it in mind for possible future revisit

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6be3560 May 22, 2024
cloud-fan pushed a commit that referenced this pull request May 31, 2024
…d strings

### What changes were proposed in this pull request?

Following queries return unexpected results:
```
select collation(map('a', 'b' collate utf8_binary_lcase)['a']);
select collation(element_at(map('a', 'b' collate utf8_binary_lcase), 'a'));
```
Both return `UTF8_BINARY` instead of `UTF8_BINARY_LCASE`.

The error was introduced by changes in #46661 that tried to solve other problem with `RaiseError` expression (please refer to PR for details). Partially revert changes of mentioned PR and fix the `RaiseError` issue by explicitly setting `UTF8_BINARY` collation.

### Why are the changes needed?

To fix wrong results of mentioned queries.

### Does this PR introduce _any_ user-facing change?

Yes, it fixed explained error.

### How was this patch tested?

Added test to `CollationSQLExpressionsSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46758 from nikolamand-db/SPARK-48430.

Authored-by: Nikola Mandic <nikola.mandic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
riyaverm-db pushed a commit to riyaverm-db/spark that referenced this pull request Jun 7, 2024
…d strings

### What changes were proposed in this pull request?

Following queries return unexpected results:
```
select collation(map('a', 'b' collate utf8_binary_lcase)['a']);
select collation(element_at(map('a', 'b' collate utf8_binary_lcase), 'a'));
```
Both return `UTF8_BINARY` instead of `UTF8_BINARY_LCASE`.

The error was introduced by changes in apache#46661 that tried to solve other problem with `RaiseError` expression (please refer to PR for details). Partially revert changes of mentioned PR and fix the `RaiseError` issue by explicitly setting `UTF8_BINARY` collation.

### Why are the changes needed?

To fix wrong results of mentioned queries.

### Does this PR introduce _any_ user-facing change?

Yes, it fixed explained error.

### How was this patch tested?

Added test to `CollationSQLExpressionsSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46758 from nikolamand-db/SPARK-48430.

Authored-by: Nikola Mandic <nikola.mandic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants