-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Internal error in regexp_replace() for some StringView input #12203
Conversation
There are a few things I noted when I reviewed this pull request, some of which may be good to either fix here or in other tickets/PRs:
|
I've added some additional sqllogictests for flags & modified the type signature for
Could you please elaborate more on the return types in the first bullet? Thanks! |
Sure. That was one I saw that wasn't related to anything in this PR but that I think might be improved.
Unless I'm mistaken that return type method's args are the same as what is passed to invoke (and what is declared in the signature) and that is just Utf8, LargeUtf8 and Utf8View (and the signature doesn't even cover the LargeUtf8 case). Basically, I think it needs to be cleaned up wrt types to be consistent. |
Note that the above comment could (should?) be another PR - it was just noted by me when looking at the code for this. |
@alamb not sure if you have eyes on this but its a |
I ran the regexp benchmarks on this PR and verified that there are no changes for the existing types
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @devanbenz and @Omega359
🚀 |
Which issue does this PR close?
Closes #12150 & closes #11912
Rationale for this change
What changes are included in this PR?
Currently the StringView data type cannot be cast to
as_generic_string_array
. The changes here resolve that: https://github.com/apache/datafusion/pull/12203/files#diff-0007996e12eb1b2e63363974424f13330f85a7fc48e0c55381f8a30a0f372931R206Another point of issue was in the
return_type
match statement when using a function within the query onUtf8View
that appears to be returning a ScalarArray ofStringArray
For example:
lower(column1_utf8view)
would return aStringArray
instead of aStringViewArray
which would cause the result of the query to return aStringArray
. For now I am implementing a conversion step https://github.com/apache/datafusion/pull/12203/files#diff-0007996e12eb1b2e63363974424f13330f85a7fc48e0c55381f8a30a0f372931R516 to coerce the return value in to being a StringView as required by the signatures.I think this is okay. Would love to work on optimizing/making this code better if possible though. For now this appears to be the path forward for the error I was seeing.
Are these changes tested?
Are there any user-facing changes?