-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add random SQL function #303
Conversation
singular case
tabular align case
|
37c8fa0
to
a9915e7
Compare
30d6368
to
35d440c
Compare
// evaluate the arguments, if there are no arguments we'll instead pass in a null array of | ||
// batch size (as a convention) | ||
let inputs = match self.args.len() { | ||
0 => vec![ColumnarValue::Array(Arc::new(NullArray::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about passing the length as a ColumnarValue::Scalar
value for now?
I am not sure if I'm totally happy with that either, but that wil avoid generating a temporary array only for accessing the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the alternative is to drastically change the signature of scalar functions. it confess compared with that this is cleaner (but a bit hacky).
will change to use a scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess then another option is to add a new type of node which models no arg functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also #307
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that NullArray
is composed by zero buffers, zero childs, no validity and one datatype, so the cost to instantiate it is really small. The advantage over a ScalarValue
is that the semantics of getting a length are preserved: use array.len()
as any other array.
I am not married with any; was just trying to think about this from a documentations' perspective:
We support zero-argument UDFs. They MUST be declared as accepting zero arguments and the function signature MUST be a single argument. DataFusion will pass an
Array
to it, from which you can retrieve its length viaArray::len()
. The function MUST return an array whose number of rows equals the length of the array.
If we pass a scalar of any type, if the evaluation is distributed, I believe that we will have to serialize Scalar -> Array
in Ballista.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb @Dandandan and @jorgecarleitao thanks for the discussion.
I agree that both ways are similar and equally “hacky” but for lack of a better solution they are okay. I'd slightly prefer null array because there's no ScalarValue::USize and having to convert from/to UInt32 is a bit cumbersome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao / @Dandandan / @jimexist are we all cool with using a NullArray
to pass "how many rows are passed to this function that has no input arguments"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's do that for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the plan should be merge #328 and then rebase this PR to pick up that change
35d440c
to
20d35fa
Compare
Codecov Report
@@ Coverage Diff @@
## master #303 +/- ##
==========================================
+ Coverage 75.71% 75.72% +0.01%
==========================================
Files 143 143
Lines 23881 23910 +29
==========================================
+ Hits 18081 18107 +26
- Misses 5800 5803 +3
Continue to review full report at Codecov.
|
77ab4ba
to
fb74a59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking really nice now. Thank you @jimexist !
Which issue does this PR close?
add random SQL function. unlike the
now()
function, each row shall be individually generated.Closes #304
based on #307
Rationale for this change
Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
What changes are included in this PR?
There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
Are there any user-facing changes?
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please add the
breaking change
label.