feat: Add `fail_on_overflow` option to `BinaryExpr` #11400

andygrove · 2024-07-10T19:07:29Z

Which issue does this PR close?

Part of #3520

Rationale for this change

In Comet, we are emulating Spark behavior and we need the ability to choose between failing on overflow or not. This seems like it is also important for Postgres compatibility, according to #3520

What changes are included in this PR?

Add an option to BinaryExpr to control overflow behavior for some operators.

Are these changes tested?

Yes, new tests added.

Are there any user-facing changes?

No, the new flag defaults to false and is not wired into the planner yet.

viirya · 2024-07-10T19:28:24Z

datafusion/core/src/physical_planner.rs

@@ -2312,7 +2312,7 @@ mod tests {
        // verify that the plan correctly casts u8 to i64
        // the cast from u8 to i64 for literal will be simplified, and get lit(int64(5))
        // the cast here is implicit so has CastOptions with safe=true
-        let expected = "BinaryExpr { left: Column { name: \"c7\", index: 2 }, op: Lt, right: Literal { value: Int64(5) } }";
+        let expected = "BinaryExpr { left: Column { name: \"c7\", index: 2 }, op: Lt, right: Literal { value: Int64(5) }, fail_on_overflow: false }";


I wonder if we want to display fail_on_overflow?

This is coming from the derived Debug trait

viirya

This makes sense to me. I also think it is not specified to Spark but should be ansi compliant behavior so other engines might have the behavior too.

viirya · 2024-07-10T19:32:06Z

datafusion/physical-expr/src/expressions/binary.rs

+    /// Specifies whether an error is returned on overflow or not
+    fail_on_overflow: bool,


Comet can use this when constructing these physical expressions. For DataFusion, it can choose to set this flag on for ansi compliant cases/configs if any.

alamb

Looks good to me

I had an API suggestion but I don't think it is critical

ALso @Omega359 has discussed similar things in the past I think, so maybe he has some comments

alamb · 2024-07-10T19:50:11Z

datafusion/physical-expr/src/expressions/binary.rs

+        }
+    }
+
+    /// Create new binary expression with explicit fail_on_overflow value


Another alternative API might be more of a builder style and easier to extend in the fuuure

// create expression let expr = BinaryExpr::new( Arc::new(Column::new("l", 0)), Operator::Plus, Arc::new(Column::new("r", 1)), ) // configure to fail on overflow .with_fail_on_overflow(true);

Omega359 · 2024-07-10T20:50:54Z

Related: #10744

I have a personal branch where I maintain a 'safe' mode for to_timestamp and to_date functions. It would be nice to get that sort of functionality into the core however I would only really want to do it if it could be driven via config somehow. My attempts to make that work didn't pan out well and the best approach is likely one where invoke method is provided state (sessionState, SessionConfig, something) where we could query the config for flag(s) and adjust behaviour on the fly based on that.

andygrove · 2024-07-10T21:31:19Z

Related: #10744

I have a personal branch where I maintain a 'safe' mode for to_timestamp and to_date functions. It would be nice to get that sort of functionality into the core however I would only really want to do it if it could be driven via config somehow. My attempts to make that work didn't pan out well and the best approach is likely one where invoke method is provided state (sessionState, SessionConfig, something) where we could query the config for flag(s) and adjust behaviour on the fly based on that.

Can you remind me why you need the physical expr to have access to configs at runtime rather than having the planner pass the relevant configs into the expressions when they are created?

Omega359 · 2024-07-10T21:47:04Z

Can you remind me why you need the physical expr to have access to configs at runtime rather than having the planner pass the relevant configs into the expressions when they are created?

It's the singleton nature of UDF's currently - there isn't a way to configure them at all. You can replace a udf in the FunctionFactory with one with a different config which works well for sql however if you want to use the dataframe api with the standard function names such a to_timestamp(args) vs manually calling a modified function via something like ctx.udf("to_timestamp").unwrap().call(args) you can't as you cannot replace or modify what that function calls as it's a singleton

static $GNAME: std::sync::OnceLock<std::sync::Arc<datafusion_expr::ScalarUDF>> =
            std::sync::OnceLock::new();

If there was a way to configure those (via argument to invoke, etc) or replace the singletons with a pre-configured instance (or better yet, referencing the one in the FunctionFactory) that would solve the issue.

jayzhan211 · 2024-07-11T01:34:14Z

If there was a way to configure those (via argument to invoke, etc) or replace the singletons with a pre-configured instance (or better yet, referencing the one in the FunctionFactory) that would solve the issue.

How about overwrite existing function based on config?

set datafusion.function.to_timestamp = spark;

register "to_timestamp" with spark like one

set datafusion.function.to_timestamp = default;

register "to_timestamp" with default one

set datafusion.function.all = spark;

register with all_spark_functions()

We can also overwrite packaged functions base on config to overwrite them, like what we have now

/// Return all default functions
pub fn all_default_functions() -> Vec<Arc<ScalarUDF>> {
    core::functions()
        .into_iter()
        .chain(datetime::functions())
        .chain(encoding::functions())
        .chain(math::functions())
        .chain(regex::functions())
        .chain(crypto::functions())
        .chain(unicode::functions())
        .chain(string::functions())
        .collect::<Vec<_>>()
}

/// Return all spark functions to overwrite
pub fn all_spark_functions() -> Vec<Arc<ScalarUDF>> {
    // 
}

Register functions here

    async fn set_variable(&self, stmt: SetVariable) -> Result<DataFrame> {
        let SetVariable {
            variable, value, ..
        } = stmt;

        // register functions based on config
        self.state().register_udaf(udaf);

        let mut state = self.state.write();
        state.config_mut().options_mut().set(&variable, &value)?;
        drop(state);

        self.return_empty_dataframe()
    }

* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api

andygrove added 2 commits July 10, 2024 12:44

update tests

96f0ed8

update tests

cd6ba11

github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Jul 10, 2024

add rustdoc

af4cbf8

andygrove mentioned this pull request Jul 10, 2024

feat: ANSI support for Add apache/datafusion-comet#616

Open

update PartialEq impl

ab26a50

andygrove requested review from liukun4515 and viirya July 10, 2024 19:17

fix

cbc203b

viirya reviewed Jul 10, 2024

View reviewed changes

viirya approved these changes Jul 10, 2024

View reviewed changes

viirya reviewed Jul 10, 2024

View reviewed changes

alamb approved these changes Jul 10, 2024

View reviewed changes

address feedback about improving api

19df99f

andygrove merged commit 2413155 into apache:main Jul 11, 2024
23 checks passed

andygrove deleted the bin-expr-fail-on-overflow branch July 11, 2024 14:56

Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Jul 12, 2024

feat: Add fail_on_overflow option to BinaryExpr (apache#11400)

cf5e172

* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024

feat: Add fail_on_overflow option to BinaryExpr (apache#11400)

009c6cd

* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api

xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024

feat: Add fail_on_overflow option to BinaryExpr (apache#11400)

17f14b1

* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api

xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024

feat: Add fail_on_overflow option to BinaryExpr (apache#11400)

6387269

* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `fail_on_overflow` option to `BinaryExpr` #11400

feat: Add `fail_on_overflow` option to `BinaryExpr` #11400

andygrove commented Jul 10, 2024

viirya Jul 10, 2024

andygrove Jul 10, 2024

viirya left a comment

viirya Jul 10, 2024

alamb left a comment

alamb Jul 10, 2024

Omega359 commented Jul 10, 2024 •

edited

Loading

andygrove commented Jul 10, 2024

Omega359 commented Jul 10, 2024

jayzhan211 commented Jul 11, 2024 •

edited

Loading

		/// Specifies whether an error is returned on overflow or not
		fail_on_overflow: bool,

feat: Add fail_on_overflow option to BinaryExpr #11400

feat: Add fail_on_overflow option to BinaryExpr #11400

Conversation

andygrove commented Jul 10, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

viirya Jul 10, 2024

Choose a reason for hiding this comment

andygrove Jul 10, 2024

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

viirya Jul 10, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Jul 10, 2024

Choose a reason for hiding this comment

Omega359 commented Jul 10, 2024 • edited Loading

andygrove commented Jul 10, 2024

Omega359 commented Jul 10, 2024

jayzhan211 commented Jul 11, 2024 • edited Loading

feat: Add `fail_on_overflow` option to `BinaryExpr` #11400

feat: Add `fail_on_overflow` option to `BinaryExpr` #11400

Omega359 commented Jul 10, 2024 •

edited

Loading

jayzhan211 commented Jul 11, 2024 •

edited

Loading