Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SQL UDF #10151

Open
Tracked by #7405
xxchan opened this issue Jun 2, 2023 · 7 comments
Open
Tracked by #7405

Support SQL UDF #10151

xxchan opened this issue Jun 2, 2023 · 7 comments
Assignees

Comments

@xxchan
Copy link
Member

xxchan commented Jun 2, 2023

Motivation

Reusability

e.g.,

  • user may have a very complex function parameter and need to call it multiple times.
  • a reusable, parameterized subquery, like a logical view with parameters. e.g., select * from orders where id = $1 and deleted = false and create_date > $2

Easier to migrate from other SQL dialects to RisingWave

Different SQL dialects (Flink/Hive/Clickhouse) have different syntax sugars. Usually there are equivalent functions, but rewriting SQL can be error-prone. If we allow them to define a SQL UDF as an alias, they don't need to modify their SQLs at all.

Users are very familiar with their original SQL, and the less changes made to the SQL logic, the lower the migration cost will be.

This is one example #10145

Feature subset we can support

LANGUAGE SQL is also quite complex, as it allows multiple statements (DMLs are allowed). We won't want to support this now (and don't support multi-statement).

SQL functions execute an arbitrary list of SQL statements, returning the result of the last query in the list. In the simple (non-set) case, the first row of the last query's result will be returned. (Bear in mind that “the first row” of a multirow result is not well-defined unless you use ORDER BY.) If the last query happens to return no rows at all, the null value will be returned.

But a subset of SQL UDF won't be hard to implement. Specifically, we can support those can be inlined as a subquery. And we can implement the features one by one. Even we only support the simplest forms, it's still useful.

CREATE FUNCTION name ( args ) RETURNS type
LANGUAGE SQL
    | AS 'definition'
    | sql_body
  } ...

sql_body is like

RETURN expression

A note from PG doc about this form:

This form is parsed at function definition time, the string constant form is parsed at execution time; therefore this form cannot support polymorphic argument types and other constructs that are not resolvable at function definition time. This form tracks dependencies between the function and objects used in the function body, so DROP ... CASCADE will work correctly, whereas the form using string literals may leave dangling functions. Finally, this form is more compatible with the SQL standard and other SQL implementations.

definition is a SQL string

1 returns a value (1 row 1 col)

1.1 RETURN expression

This should be trivial.

1.2 SELECT an expression

SELECT $1+$2;

1.3 SELECT multiple expressions

e.g.,

create function f(out a int, out b int)  language sql as 'select 1,2';

This needs to change multiple cols to a struct. Cannot simply convert select f(), ... to select (select 1,2), ... (more than 1 cols).

1.4 polymorphic arguments

e.g.,

CREATE FUNCTION is_greater(anyelement, anyelement) RETURNS boolean AS $$
    SELECT $1 > $2;
$$ LANGUAGE SQL;


CREATE FUNCTION dup (f1 anyelement, OUT f2 anyelement, OUT f3 anyarray)
AS 'select $1, array[$1,$1]' LANGUAGE SQL;

1.5 SELECT with FROM

e.g., agg and point-get query. might consider together with 2.

2. returns a table (setof) (n rows 1 col)

This includes: table functions and more general SELECT subquery.

Postgres executes the SQL UDF in ProjectSet. It might be possible to convert it to a JOIN. Consider it later.

e.g.,

create function f() returns setof int language sql as 'select * from t';

Cannot simply convert to select (select * from t), ... (more than 1 rows).

Some points might need to take care of

  • determinism
  • overloading
    • name conflict with built-in functions?
  • the syntax has many variants. need to read the reference carefully. (e.g., OUT parameter vs RETURNS type)

Reference

@xxchan
Copy link
Member Author

xxchan commented Jun 14, 2023

Some offline discussions:

  • @st1page proposed that for setof UDTF, we also directly inline it to subquery. This behaviour is different from PG, and I prefer not implementing it to doing it differently with PG. So at least we need further discussion for this.
  • @fuyufjh thinks this is not useful enough to cover complex UDF cases and thus the motivation isn't strong enough. (see also Discussion: whether to add new query syntaxes (syntax sugars) #10268 (comment)) For simple cases like function aliases, it's also not significantly better than global string substitution.

@xxchan xxchan removed this from the release-1.0 milestone Jul 14, 2023
@neverchanje
Copy link
Contributor

neverchanje commented Aug 9, 2023

Btw, I think we can probably encourage our users to use DBT's macro in the case where they need UDF. My major concern with this feature is the engineering complexity that it'll introduce to our system.

@xxchan
Copy link
Member Author

xxchan commented Aug 9, 2023

Not very familiar with DBT. I'm wondering is it easy to adopt for users unfamiliar with it? Will it require users to change their workflow? Or can be used as a handy Swiss Army Knife? 🤔

Complexity is definitely the concern, and also Eric's reason for opposing. But maybe the simplist expression case is most useful and has very low complexity.

e.g.,

Yes -- say I need to parse jsonb to handle missing values and casting, I don't necessarily want extra piece of infrastructure (python server) when a sql expression could be enough

https://risingwave-community.slack.com/archives/C02T3F7UYM6/p1689610339419499?thread_ts=1689610181.827109&cid=C02T3F7UYM6

For such use case, maybe users don't want extra DBT neither.

Copy link
Contributor

github-actions bot commented Jul 3, 2024

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

@st1page
Copy link
Contributor

st1page commented Jul 4, 2024

is it done? Can we close the issue?

@xzhseh
Copy link
Contributor

xzhseh commented Jul 4, 2024

is it done? Can we close the issue?

Not yet for some specific syntax, see details here.

I can pick this up when I have time.

@st1page
Copy link
Contributor

st1page commented Jul 4, 2024

is it done? Can we close the issue?

Not yet for some specific syntax, see details here.

I can pick this up when I have time.

Oh, I just thought we had already finished it. Thanks ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants