Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Binary --> String coercion for StringView/BinaryView in LIKE #12500

Closed
Tracked by #11752
alamb opened this issue Sep 17, 2024 · 2 comments · Fixed by #12643
Closed
Tracked by #11752

Support Binary --> String coercion for StringView/BinaryView in LIKE #12500

alamb opened this issue Sep 17, 2024 · 2 comments · Fixed by #12643
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Sep 17, 2024

Is your feature request related to a problem or challenge?

Part of #11752

While working on enabling StringView by default in #12092 I found another feature gap that occurs in the ClickBench benchmarks

ClickBench hits_partitioned has a column that resolves to Binary (BinaryView after #12092) that is then treated as a String (compared to a string, etc).

In order for this to work, DataFusion needs to know it is ok to cast to String. It knows how to do this for Binary --> Utf8 but not BinaryView --> Utf8View, etc. Without this running ClickBench on hits_partitioned does not work.

A small example is like

> create table foo as values (arrow_cast('one', 'BinaryView'), arrow_cast('two', 'BinaryView'));
0 row(s) fetched.
Elapsed 0.006 seconds.

> select column1 like 'o%' from foo;
type_coercion
caused by
Error during planning: There isn't a common type to coerce BinaryView and Utf8 in LIKE expression

Describe the solution you'd like

  1. Add coercion rules for BinaryView --> Utf8/Utf8View
  2. Add a test

Describe alternatives you've considered

Fix the relevant code here:

(Binary, Utf8) => Some(Utf8),
(Binary, LargeUtf8) => Some(LargeUtf8),
(LargeBinary, Utf8) => Some(LargeUtf8),
(LargeBinary, LargeUtf8) => Some(LargeUtf8),
(Utf8, Binary) => Some(Utf8),
(Utf8, LargeBinary) => Some(LargeUtf8),
(LargeUtf8, Binary) => Some(LargeUtf8),
(LargeUtf8, LargeBinary) => Some(LargeUtf8),
_ => None,

(you can see what I had to do on #12092)

Add a test in sqllogictest

Perhaps in this file:

# LIKE
query ?
SELECT binary FROM t where binary LIKE '%F%';
----
466f6f
466f6f426172

Additional context

No response

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Sep 17, 2024
@alamb
Copy link
Contributor Author

alamb commented Sep 17, 2024

I think this issue is well specified so marking it as a good first issue

@doupache
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants