-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add functions for splitting strings #346
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,13 +101,13 @@ scalar_functions: | |
impls: | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "varchar<L1>" | ||
name: "input" | ||
|
@@ -120,13 +120,13 @@ scalar_functions: | |
return: "varchar<L1>" | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "string" | ||
name: "input" | ||
|
@@ -523,13 +523,13 @@ scalar_functions: | |
impls: | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "varchar<L1>" | ||
name: "input" | ||
|
@@ -542,13 +542,13 @@ scalar_functions: | |
return: i64 | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "string" | ||
name: "input" | ||
|
@@ -620,13 +620,13 @@ scalar_functions: | |
impls: | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "string" | ||
name: "input" | ||
|
@@ -637,13 +637,13 @@ scalar_functions: | |
return: i64 | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "varchar<L1>" | ||
name: "input" | ||
|
@@ -654,13 +654,13 @@ scalar_functions: | |
return: i64 | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "fixedchar<L1>" | ||
name: "input" | ||
|
@@ -1015,13 +1015,13 @@ scalar_functions: | |
impls: | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "string" | ||
name: "input" | ||
|
@@ -1041,13 +1041,13 @@ scalar_functions: | |
return: "string" | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII] | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED] | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED] | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "varchar<L1>" | ||
name: "input" | ||
|
@@ -1263,6 +1263,74 @@ scalar_functions: | |
- value: i32 | ||
name: "count" | ||
return: "string" | ||
- | ||
name: string_split | ||
description: >- | ||
Split a string into a list of strings, based on a specified `separator` character. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: The input string. | ||
- value: "varchar<L2>" | ||
name: "separator" | ||
description: A character used for splitting the string. | ||
return: "List<varchar<L1>>" | ||
- args: | ||
- value: "string" | ||
name: "input" | ||
description: The input string. | ||
- value: "string" | ||
name: "separator" | ||
description: A character used for splitting the string. | ||
return: "List<string>" | ||
- | ||
name: regex_string_split | ||
description: >- | ||
Split a string into a list of strings, based on a regular expression pattern. The | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like this could use a bit more explanation. I guess the idea is that it works the same as a regular string split, i.e. removing the substrings matched by the regex from the resulting string list. However, I could also imagine someone interpreting it as the regex picking only the split point, such that every character from the original string ends up in one of the returned list elements. Both implementations would be useful, but the one that removes the matched string is more expressive, because you could wrap the regex in a positive lookahead to mimic the other implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I updated the description to be more detailed. From what I've seen, the implementation that removes the matched substring is also how a bunch of different SQL dialects do it. |
||
regular expression pattern should follow the International Components for Unicode | ||
implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). | ||
|
||
The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. | ||
Enabling the `multiline` option will treat the input string as multiple lines. This makes | ||
the `^` and `$` characters match at the beginning and end of any line, instead of just the | ||
beginning and end of the input string. Enabling the `dotall` option makes the `.` character | ||
match line terminator characters in a string. | ||
impls: | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: The input string. | ||
- value: "varchar<L2>" | ||
name: "pattern" | ||
description: The regular expression to search for within the input string. | ||
return: "List<varchar<L1>>" | ||
- args: | ||
- name: case_sensitivity | ||
options: [ CASE_SENSITIVE, CASE_INSENSITIVE, CASE_INSENSITIVE_ASCII ] | ||
required: false | ||
- name: multiline | ||
options: [ MULTILINE_DISABLED, MULTILINE_ENABLED ] | ||
required: false | ||
- name: dotall | ||
options: [ DOTALL_DISABLED, DOTALL_ENABLED ] | ||
required: false | ||
- value: "string" | ||
name: "input" | ||
description: The input string. | ||
- value: "string" | ||
name: "pattern" | ||
description: The regular expression to search for within the input string. | ||
return: "List<string>" | ||
|
||
aggregate_functions: | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't exactly sure how to put the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.