-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add union_extract scalar function #12116
Conversation
} | ||
|
||
const EQ_SCALAR_CHUNK_SIZE: usize = 512; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those pub
items with #[doc(hidden)]
are there to be benchmarked, in case any reviewer wants to, but maybe it can be removed before merge. Just let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think we could remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed on arrow-rs PR
|
||
fn criterion_benchmark(c: &mut Criterion) { | ||
let union_extract = UnionExtractFun::new(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although most part of these instantiations are very repetitive, the differences between them are so subtle that I personally prefer to be as explicit as possible. Just ask for simplification if that's preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure do we even need this 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you aren't sure, I'm more than happy to not even need to include them on the arrow-rs PR 😀
ctx.register_batch(table_name, batch).unwrap(); | ||
} | ||
|
||
fn sparse_1_1_single_field(ctx: &SessionContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone through the implementation line by line, but overall this looks pretty good to me.
impl UnionExtractFun { | ||
pub fn new() -> Self { | ||
Self { | ||
signature: Signature::any(2, Volatility::Immutable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific than this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I will take a look at Signature::UserDefnied
and ScalarUDFImpl::coerce_types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed using ScalarUDFImpl::coerce_types in case you wanna take a look
It changed the error messages, from this to this, but I'm not sure if it's better
); | ||
} | ||
|
||
let fields = if let DataType::Union(fields, _) = &arg_types[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicer to write this like
let DataType::Union(fields, _) = &arg.types[0] else {
return exec_err!(...
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this syntax wasn't stable when I started rust, didn't know about it. Thanks!
Done
); | ||
}; | ||
|
||
let field_name = if let Expr::Literal(ScalarValue::Utf8(Some(field_name))) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// This is like MutableBuffer::collect_bool(type_ids.len(), |i| type_ids[i] == target) with fast paths for all true or all false values. | ||
#[doc(hidden)] | ||
pub fn eq_scalar_generic<const N: usize>(type_ids: &[i8], target: i8) -> BoolValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we make them private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made them private on arrow-rs
@@ -0,0 +1,255 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name with union_function.slt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better
Done
@@ -361,3 +369,893 @@ fn create_example_udf() -> ScalarUDF { | |||
adder, | |||
) | |||
} | |||
|
|||
fn register_union_tables(ctx: &SessionContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we create table in slt file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until #10206 is fixed, I don't think so. Anyway, with the core of this PR moved to arrow-rs, all these tests are unit tests on arrow-rs now 🙏
.build_unchecked() | ||
}; | ||
|
||
Ok(ColumnarValue::Array(make_array(data))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here seems equivalent to
new_null_array(target.data_type(), target.len());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically yes, but new_null_array
also allocates data buffers
Thanks @samuelcolvin and @jayzhan211 for your reviews. I moved part of this PR to apache/arrow-rs#6387. I made all your requests that were applicable there, and then I will make the requests that still apply here. |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #11081
What changes are included in this PR?
union_extract implementation and docs
Add tables with union column on the session context of sqllogictests
Are these changes tested?
Yes. There's sqllogictests for one sparse and one dense union, and for each error case
Since it's not possible yet to create scalar unions in SQL, those cases are tested as unit tests
Are there any user-facing changes?
A new
union_extract
scalar function, and it's docs