-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#554: Lead/lag window function with offset and default value arguments #687
Conversation
I have a few doubts:
Signature::OneOf(vec![
Signature::Any(1),
Signature::Any(2),
Signature::Any(3),
])
Signature::OneOf(types) => {
let mut r = vec![];
for s in types {
// we must ensure that one of the signatures is valid against the current_types
if let Ok(valid_types) = get_valid_types(s, current_types) {
r.extend(valid_types);
}
}
r
} original code is currently forcing all options within Signature::OneOf to be valid Signature::OneOf(types) => {
let mut r = vec![];
for s in types {
r.extend(get_valid_types(s, current_types)?);
}
r
}
|
FYI @jimexist -- do you have some time to review this code? |
} | ||
|
||
impl PartitionEvaluator for WindowShiftEvaluator { | ||
fn evaluate_partition(&self, partition: Range<usize>) -> Result<ArrayRef> { | ||
let value = &self.values[0]; | ||
let value = value.slice(partition.start, partition.end - partition.start); | ||
shift(value.as_ref(), self.shift_offset).map_err(DataFusionError::ArrowError) | ||
if let Some(default_value) = self.default_value.clone() { | ||
shift_with_default_value(&value, self.shift_offset, default_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in both branches you can just call this shift_with_default_value
. if the default value isn't present, you can unwrap or provide a null scalar value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the conditional statement and changed to one single call to shift_with_default_value with the scalar_value as an option
@@ -131,7 +131,10 @@ fn get_valid_types( | |||
Signature::OneOf(types) => { | |||
let mut r = vec![]; | |||
for s in types { | |||
r.extend(get_valid_types(s, current_types)?); | |||
if let Ok(valid_types) = get_valid_types(s, current_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I have misunderstood it, but in the original code, if one of the signatures within OneOf fails then the whole get_valid_types returns an error, isn't it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you were totally correct in that the original code didn't work. i wasn't seeing this the first round.
i was thinking that you'd need to remove the commented out line below. also do you mind adding a unit test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use filter_map
and then take first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! changed to filter_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change looks generally okay, thanks for the effort!
I would suggest: replacing branching with one call and also add a case in the integration test folder, esp.:
- out of bounds shift offset,
- negative bounds
- default value being null
that part was confusing to me as well. i believe a most elegant way is to create support for heterogenous support for function arguments
i'm okay with integration tests that cover these
changing arrow-rs repo is one way to achieve it although it'll needs two more weeks to be released. i'm okay with keeping the copied and modified version here. |
Thank you @jimexist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation looks great, 👍 thanks!
would be great to add integration test coverage as well before merge
default_value: Option<ScalarValue>, | ||
} | ||
|
||
fn shift_with_default_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a comment of todo to push this upstream to arrow-rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO comment
let shift_offset = coerced_args | ||
.get(1) | ||
.map(|v| { | ||
v.as_any() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like you can have a helper function to extract argument as Result<Literal>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, created get_scalar_value_from_args helper function.
@jimexist I was trying to execute the integration tests manually,
There was some errors: type coercing problems between default_value and the slice data_type But if I try to test agains some custom partition SELECT
LAG(c8) OVER(PARTITION BY c9 ORDER BY c9) AS prev_c8
FROM test
ORDER BY c8; datafusion throws an error
I will try to look at this later today if I can ... |
if the first example works then it's fine. the second example might fail because of different reasons not related to this change. |
Two more questions/doubts ?
|
@jimexist do you think this PR is ready? Do you need help reviewing ? |
looks okay after rebasing. |
34475c7
to
10aedf9
Compare
10aedf9
to
6d5e1f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some(ScalarValue::Int32(Some(100))), | ||
), | ||
vec![ | ||
Some(100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
#Closes #554.
Rationale for this change
Implement offset and default value optional arguments in lead/lag window functions.
What changes are included in this PR?
Changes in lead/lag pub fn signatures.
Are there any user-facing changes?