-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udf): support function that returns multiple columns #8644
Conversation
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Codecov Report
@@ Coverage Diff @@
## main #8644 +/- ##
==========================================
- Coverage 71.23% 71.22% -0.01%
==========================================
Files 1155 1155
Lines 191247 191271 +24
==========================================
+ Hits 136231 136240 +9
- Misses 55016 55031 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I think so, knowing the cardinality does not changes, the optimizer can optimize the
I think it is the |
How do you define "table function" here? In my dictionary, TF = SRF 😅 |
Signed-off-by: Runji Wang <wangrunji0408@163.com>
just like, in that case, the output rows can depend on any row of the input, so it might can only output the result when all the input comes. e,g, implement a table function that can sort the original input relation with some keys and assign the rank to a new column. it means that
|
Latest PG doesn't have the term TF any more. Bigquery's TF looks the same as PG SRF to me. Therefore, I still think it's only an alias. "M to N" is indeed a different case, but I think it's "1-N" TF vs "M-N" TF, instead of TF vs SRF. (Correct me if I'm wrong.) I sincerely hope to have unified term usages, bacause it confused me a lot. 🥹 |
Didn't think much about it, but "M-N" TF sounds just like OVER windowing? |
me too. so my starting point just is that because TF looks a vague concept but we all know what is SRF, maybe use SRF instead of TF is better? |
I think no, at least the over window function will give the same number of output rows with the input 🤔 but the m-n TF do not have that restriction |
"we all know what is SRF" I don't think so:
Initially I called it TF when implementing ProjectSet because it sounds slightly better for people without any background 🤪. And it seems more commonly used in other systems than PG. SRF seems not used outside PG. But being consistent with PG might be an accepatable argument to me. So I'm OK if someone insists on that. One nit BTW, shouldn't it be "Bag Returning" instead of "Set Returning"?? 🤣 Another point: PG's sytax is
, instead of https://www.postgresql.org/docs/current/sql-createfunction.html But mentioned in the doc, there's also a
Considering this, "Table function" is a better name. Don't know whether the name "SRF" is a historical issue. |
Do you have examples for m-n TF? |
BTW, snowflake UDTF is only 1-n 🤔
https://docs.snowflake.com/en/sql-reference/udf-overview#scalar-and-tabular-functions |
BTW, our discussion might be off-topic in the PR 🤣 |
well, let us maintain the Table Funtion name before we really need to implement the m-n table function |
In fact, Any standard or not standard relational operator. after some investigation I think all system I know implement it with sql 🤔 |
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@@ -22,16 +24,25 @@ def gcd3(x: int, y: int, z: int) -> int: | |||
return gcd(gcd(x, y), z) | |||
|
|||
|
|||
@udf(input_types=['BINARY'], result_type='STRUCT<src_ip VARCHAR, dst_ip VARCHAR, src_port SMALLINT, dst_port SMALLINT>') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Is it possible to use Python3 type hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to infer SQL type from function signatures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, just like how @dataclass
do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. It seems feasible.
I can think of one more benefit: a python function can be compatible with multiple SQL functions.
For example: def gcd(x: int, y: int): int
can be used as SQL functions:
gcd(INT2, INT2) -> INT2
gcd(INT4, INT4) -> INT4
gcd(INT8, INT8) -> INT8
- ...
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR adds returning multiple columns support for both UDF and UDTF.
For scalar functions, the result type should be defined as
struct
. The python function can return a tuple:For table functions:
Some discussions:
extract_tcp_info
be defined as a table function? or a scalar function that returns struct type?Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note
User-defined table functions can return multiple columns now.