-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add functions for splitting strings #346
Conversation
- value: "varchar<L2>" | ||
name: "separator" | ||
description: A character used for splitting the string. | ||
return: "List<varchar<L1>>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't exactly sure how to put the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
- | ||
name: regex_string_split | ||
description: >- | ||
Split a string into a list of strings, based on a regular expression pattern. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could use a bit more explanation. I guess the idea is that it works the same as a regular string split, i.e. removing the substrings matched by the regex from the resulting string list. However, I could also imagine someone interpreting it as the regex picking only the split point, such that every character from the original string ends up in one of the returned list elements. Both implementations would be useful, but the one that removes the matched string is more expressive, because you could wrap the regex in a positive lookahead to mimic the other implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I updated the description to be more detailed. From what I've seen, the implementation that removes the matched substring is also how a bunch of different SQL dialects do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (but I'll leave this open for a little while as I'm no longer effectively the only reviewer)
@westonpace could you give this one a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question. Do we want to support a max_split
argument? Most languages (Java, Python, JavaScript) support this but I notice it is not in some standard engines (e.g. PostgreSQL, SQL server, and DuckDb).
Per the sync meeting notes, since there are a significant # of engines that do not support max_split, that would be a second function (e.g. |
PR adds functions for splitting strings and fixed some spacing in the options of other functions.