StatefulFunctions #340

alamb · 2021-05-14T17:55:33Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
On a PR that added what postgres would term a stable function (something that is not the same from transaction to transaction, but something that not a function of its inputs either), namely now(), @jorgecarleitao suggested adding a concept of a StatefulFunction to use for functions that needed state, unlike ScalarFunction which is designed to not have state.

There is a lot of discussion on #288 (comment) and I will try to summarize a bunch of that;

@jorgecarleitao :

AFAIK current_* are all derived from now; imo the differentiator aspect here is that there is some state X that is being shared.

It seems to me that the use-case here is that we want to preserve state across nodes, so that their execution depends on said state. NOW is an example, but in reality, random is also an example; we "cheated" a bit by not allowing users to select a seed. If they want that, we hit the same problem as NOW.

IMO a natural construct here is something like struct StatefulFunction<T: Send + Sync>, where T is the state, and Arc is inside of it, and that implements PhysicalExpr. During planning, the initial state is passed to it from the planner, and we are ready to fly.

The ScalarFunction construct was meant to be stateless because it makes it very easy to develop, and it also makes it obvious that is stateless. Trying to couple execution state to them is imo going beyond its scope.

@returnString

In Postgres, this sort of corresponds to the function volatility categories (https://www.postgresql.org/docs/13/xfunc-volatility.html) which might be a useful basis for any future definition of different function types.

immutable: pure function, can only use arguments and internal constants (example: basic math ops). Optimiser can do lots here
stable: can refer to shared state but must return the same value for the same arguments within a given statement (example: now). Optimiser is allowed to unify all references into one call per unique set of arguments
volatile: no rules, no optimiser potential! Must always be evaluated exactly as initially planned (example: random)

...
Off the top of my head I think it'll open up some potential for generalised optimisation passes over function usage in queries according to function class, i.e. the optimiser rule used for the initial implementation of this PR but applicable to arbitrary functions provided they indicate themselves to be "stable".

cc @returnString @jorgecarleitao @msathis @Dandandan

The text was updated successfully, but these errors were encountered:

alamb · 2021-05-14T17:56:59Z

My personal take is that adding some way to mark a ScalarFunction as being immutable, stable or volatile would be valuable for query optimization (e.g. we could inline/fold immutable functions in logical plans, inline/fold stable functions in physical plans, and never inline volatile functions)

alamb added the enhancement New feature or request label May 14, 2021

alamb added the datafusion Changes in the datafusion crate label May 14, 2021

alamb mentioned this issue May 14, 2021

[Datafusion] NOW() function support #288

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StatefulFunctions #340

StatefulFunctions #340

alamb commented May 14, 2021

alamb commented May 14, 2021

StatefulFunctions #340

StatefulFunctions #340

Comments

alamb commented May 14, 2021

alamb commented May 14, 2021