-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate JIT'd expression over arrays #2587
Conversation
This looks very cool @waynexia -- I will give it a good look tomorrow.
The other reason that JIT execution for DataFusion is interesting is due to a few operations which are fundamentally row-oriented (and thus not amenable to vectorized execution), the key being a multi-tuple comparison (not just equality) which appears in sorting, grouping, and joining) |
I got this from compiled function:
It just looks like a bare translation of what we build. So I suspect the vectorization is not done here (after translation). Further, I find this |
I believe this is a known limitation with cranelift -- we can also potentially consider using llvm in the future, but that would likely result in longer query planning times. I believe @tustvold has thought about this as well. Writing some sort of basic vectorization optimizations in cranelift (or in datafusion) is also a possibility. @Dandandan discussed it a bit #2124 (comment) |
); | ||
|
||
let result = run_df_expr(df_expr, schema, array_a, array_b).unwrap(); | ||
assert_eq!(result, expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoo, really nice 🎉
I think it is important to understand what cranelift is, and what it isn't. Cranelift is a code generator originally intended to take optimised WASM and convert it to native code. It is not an optimising compiler like LLVM. I could see it being very well suited for doing runtime monomorphisation, i.e. removing conditional branches, especially for row-oriented operators. I think it will struggle to out-perform the columnar kernels in arrow-rs, some of which are hand-rolled and all of which benefit from LLVM compilation. As for using LLVM as a JIT, it is possible, and there are fairly mature ffi bindings for doing this, however, it would be a significant increase in implementation and runtime complexity. At the moment I suspect there are significantly lower hanging fruit in terms of optimisation... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very exciting step forward 🎉 Thank you very much @waynexia
I wonder why you chose the name deref
rather than store
? The underlying cranelift library seems to use load
and store
which I think are the more common terms in compilers for what Rust terms deref
datafusion/jit/src/compile.rs
Outdated
let code_fn = | ||
unsafe { core::mem::transmute::<_, fn(i64, i64, i64, i64) -> ()>(code_ptr) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code like this is some of my favorite ❤️ -- cast some memory to a function pointer and call it ;)
Extremely unsafe but feels very powerful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why not cast to the types you really want, like fn(*i64, *i64, *i64, i64)
and then you can avoid the as i64
stuff below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion 👍
Extremely unsafe but feels very powerful
Same!
|
||
use super::*; | ||
|
||
fn run_df_expr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the longer term I would like to see this type of logic encapsulated somehow
So we would have a function or struct that took an Expr
and several ArrayRefs
and then called a JIT or non-JIT version of evaluation depending on flags or options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely!
I'm going to figure out how to play with var length param list and type casting, and then encapsulate a plan node level interface (maybe take LogicalPlan
and data as input, haven't thought in detail) as next step.
|
||
#[test] | ||
fn array_add() { | ||
let array_a: PrimitiveArray<Int64Type> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using different values for array_a
and array_b
so issues in argument handling would be evident
Like maybe
let array_b: PrimitiveArray<Int64Type> =
PrimitiveArray::from_iter_values((10..20).map(|x| x + 1));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch 😆
let expected = r#"fn calc_fn_0(table1.a_array: i64, table1.b_array: i64, result: i64, len: i64) -> () { | ||
let index: i64; | ||
index = 0; | ||
while index < len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this code and it looks 👍 to me
datafusion/jit/src/compile.rs
Outdated
format!("{}_ptr", input), | ||
w.add(w.id(format!("{}_array", input))?, w.id("offset")?)?, | ||
)?; | ||
w.declare_as(input, w.deref(w.id(format!("{}_ptr", input))?, I64)?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the generated func only accepts i64 arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to parameter 😉
let mut fn_body = builder.enter_block(); | ||
fn_body.declare_as("index", fn_body.lit_i(0))?; | ||
fn_body.while_block( | ||
|cond| cond.lt(cond.id("index")?, cond.id("len")?), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, do we need sanity check for equal array lengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated code is working on pointer directly, it might be hard to do these check inside it. I check inputs' length before pass them to generated code at here.
if lhs.len() != rhs.len() {
return Err(DataFusionError::NotImplemented(
"Computing on different length arrays not yet supported".to_string(),
));
}
But I agree that we should consider how to improve safety of generated code when its logic get complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general the idea is that all the safety checks are done during JIT generation as suggested by @waynexia
Add doc for `deref()` and `store()` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Thanks for all the reviews and information!
That makes sense. I could see a long way to implement this JIT framework and (another long way) to make it outperform the existing interpreter executor 🤣
I also have hesitated at the naming, but have now changed it to |
I also agree with @tustvold that in general it will be very hard to beat the performance of using the optimized, vectorized kernels in arrow-rs and in general I think we should continue to use them whenever possible. As I mentioned in #2587 (comment) I think the super valuable usecase for JIT'ed Exprs is for operations that can't easily (or at all) be vectorized, namely those doing multi column comparisons during Merge / Group |
Which issue does this PR close?
Related to #2122
Rationale for this change
This PR tries a way to perform JIT'd computation over Arrow array.
As I understand, we have (at least) two ways to JIT the query execution. One is to glue all the low-level compute functions together (those in arrow compute kernel), and another is this PR, which tries to perform all the computation in the JIT engine.
The first way is easier to implement (compared to the second one). And can get performance improvement from eliminated dispatch and branch. However, the second fully compiled way will take lots of effort as it requires a JIT version of compute kernel. Gandiva in Arrow C++ is an LLVM-based compute kernel that might help, but I'm not very familiar with it. Whatever, being able to combine both ways should be a better situation 😆.
Back to this PR, it will generate a loop like @Dandandan presented here. I haven't inspected whether the compiler will vectorize it. Currently, it only wraps over one
expr
, but we can explore the possibility to compile multipleplan
s into one loop like here. The row format for pipeline breaker is also significant to fully JIT.This PR only implements a very early stage "example" with many hard-code like types and fn sig. Please let me know what do you think of it!
What changes are included in this PR?
Are there any user-facing changes?