-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add fail_on_overflow
option to BinaryExpr
#11400
Conversation
@@ -2312,7 +2312,7 @@ mod tests { | |||
// verify that the plan correctly casts u8 to i64 | |||
// the cast from u8 to i64 for literal will be simplified, and get lit(int64(5)) | |||
// the cast here is implicit so has CastOptions with safe=true | |||
let expected = "BinaryExpr { left: Column { name: \"c7\", index: 2 }, op: Lt, right: Literal { value: Int64(5) } }"; | |||
let expected = "BinaryExpr { left: Column { name: \"c7\", index: 2 }, op: Lt, right: Literal { value: Int64(5) }, fail_on_overflow: false }"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to display fail_on_overflow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming from the derived Debug
trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. I also think it is not specified to Spark but should be ansi compliant behavior so other engines might have the behavior too.
/// Specifies whether an error is returned on overflow or not | ||
fail_on_overflow: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comet can use this when constructing these physical expressions. For DataFusion, it can choose to set this flag on for ansi compliant cases/configs if any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
I had an API suggestion but I don't think it is critical
ALso @Omega359 has discussed similar things in the past I think, so maybe he has some comments
} | ||
} | ||
|
||
/// Create new binary expression with explicit fail_on_overflow value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative API might be more of a builder style and easier to extend in the fuuure
// create expression
let expr = BinaryExpr::new(
Arc::new(Column::new("l", 0)),
Operator::Plus,
Arc::new(Column::new("r", 1)),
)
// configure to fail on overflow
.with_fail_on_overflow(true);
Related: #10744 I have a personal branch where I maintain a 'safe' mode for to_timestamp and to_date functions. It would be nice to get that sort of functionality into the core however I would only really want to do it if it could be driven via config somehow. My attempts to make that work didn't pan out well and the best approach is likely one where invoke method is provided state (sessionState, SessionConfig, something) where we could query the config for flag(s) and adjust behaviour on the fly based on that. |
Can you remind me why you need the physical expr to have access to configs at runtime rather than having the planner pass the relevant configs into the expressions when they are created? |
It's the singleton nature of UDF's currently - there isn't a way to configure them at all. You can replace a udf in the FunctionFactory with one with a different config which works well for sql however if you want to use the dataframe api with the standard function names such a
If there was a way to configure those (via argument to invoke, etc) or replace the singletons with a pre-configured instance (or better yet, referencing the one in the FunctionFactory) that would solve the issue. |
How about overwrite existing function based on config?
We can also overwrite packaged functions base on config to overwrite them, like what we have now /// Return all default functions
pub fn all_default_functions() -> Vec<Arc<ScalarUDF>> {
core::functions()
.into_iter()
.chain(datetime::functions())
.chain(encoding::functions())
.chain(math::functions())
.chain(regex::functions())
.chain(crypto::functions())
.chain(unicode::functions())
.chain(string::functions())
.collect::<Vec<_>>()
}
/// Return all spark functions to overwrite
pub fn all_spark_functions() -> Vec<Arc<ScalarUDF>> {
//
} Register functions here async fn set_variable(&self, stmt: SetVariable) -> Result<DataFrame> {
let SetVariable {
variable, value, ..
} = stmt;
// register functions based on config
self.state().register_udaf(udaf);
let mut state = self.state.write();
state.config_mut().options_mut().set(&variable, &value)?;
drop(state);
self.return_empty_dataframe()
} |
* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api
* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api
* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api
* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api
Which issue does this PR close?
Part of #3520
Rationale for this change
In Comet, we are emulating Spark behavior and we need the ability to choose between failing on overflow or not. This seems like it is also important for Postgres compatibility, according to #3520
What changes are included in this PR?
Add an option to
BinaryExpr
to control overflow behavior for some operators.Are these changes tested?
Yes, new tests added.
Are there any user-facing changes?
No, the new flag defaults to false and is not wired into the planner yet.