Evaluate JIT'd expression over arrays #2587

waynexia · 2022-05-22T14:36:13Z

Which issue does this PR close?

Related to #2122

Rationale for this change

This PR tries a way to perform JIT'd computation over Arrow array.

As I understand, we have (at least) two ways to JIT the query execution. One is to glue all the low-level compute functions together (those in arrow compute kernel), and another is this PR, which tries to perform all the computation in the JIT engine.

The first way is easier to implement (compared to the second one). And can get performance improvement from eliminated dispatch and branch. However, the second fully compiled way will take lots of effort as it requires a JIT version of compute kernel. Gandiva in Arrow C++ is an LLVM-based compute kernel that might help, but I'm not very familiar with it. Whatever, being able to combine both ways should be a better situation 😆.

Back to this PR, it will generate a loop like @Dandandan presented here. I haven't inspected whether the compiler will vectorize it. Currently, it only wraps over one expr, but we can explore the possibility to compile multiple plans into one loop like here. The row format for pipeline breaker is also significant to fully JIT.

This PR only implements a very early stage "example" with many hard-code like types and fn sig. Please let me know what do you think of it!

What changes are included in this PR?

Are there any user-facing changes?

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

alamb · 2022-05-22T18:58:47Z

This looks very cool @waynexia -- I will give it a good look tomorrow.

The first way is easier to implement (compared to the second one). And can get performance improvement from eliminated dispatch and branch. However, the second fully compiled way will take lots of effort as it requires a JIT version of compute kernel

The other reason that JIT execution for DataFusion is interesting is due to a few operations which are fundamentally row-oriented (and thus not amenable to vectorized execution), the key being a multi-tuple comparison (not just equality) which appears in sorting, grouping, and joining)

waynexia · 2022-05-23T14:48:59Z

I got this from compiled function:

function u0:0(i64, i64, i64, i64) system_v {
block0(v0: i64, v1: i64, v2: i64, v3: i64):
    v11 -> v0
    v14 -> v1
    v17 -> v2
    v6 -> v3
    v4 = iconst.i64 0
    jump block1(v4)

block1(v5: i64):
    v7 = icmp slt v5, v6
    v8 = bint.i64 v7
    brz v8, block3
    jump block2

block2:
    v9 = iconst.i64 8
    v10 = imul.i64 v5, v9
    v12 = iadd.i64 v11, v10
    v13 = load.i64 v12
    v15 = iadd.i64 v14, v10
    v16 = load.i64 v15
    v18 = iadd.i64 v17, v10
    v19 = iadd v13, v16
    store v19, v18
    v20 = iconst.i64 1
    v21 = iadd.i64 v5, v20
    jump block1(v21)

block3:
    return
}

It just looks like a bare translation of what we build. So I suspect the vectorization is not done here (after translation). Further, I find this I64X8 type from the document (we are currently using I64). Perhaps this means that we need to manually vectorize our computation.

alamb · 2022-05-23T18:42:55Z

It just looks like a bare translation of what we build. So I suspect the vectorization is not done here (after translation). Further, I find this I64X8 type from the document (we are currently using I64). Perhaps this means that we need to manually vectorize our computation.

I believe this is a known limitation with cranelift -- we can also potentially consider using llvm in the future, but that would likely result in longer query planning times. I believe @tustvold has thought about this as well.

Writing some sort of basic vectorization optimizations in cranelift (or in datafusion) is also a possibility. @Dandandan discussed it a bit #2124 (comment)

Dandandan · 2022-05-23T18:47:42Z

datafusion/jit/src/compile.rs

+        );
+
+        let result = run_df_expr(df_expr, schema, array_a, array_b).unwrap();
+        assert_eq!(result, expected);


Whoo, really nice 🎉

tustvold · 2022-05-23T18:49:20Z

I think it is important to understand what cranelift is, and what it isn't. Cranelift is a code generator originally intended to take optimised WASM and convert it to native code. It is not an optimising compiler like LLVM.

I could see it being very well suited for doing runtime monomorphisation, i.e. removing conditional branches, especially for row-oriented operators. I think it will struggle to out-perform the columnar kernels in arrow-rs, some of which are hand-rolled and all of which benefit from LLVM compilation.

As for using LLVM as a JIT, it is possible, and there are fairly mature ffi bindings for doing this, however, it would be a significant increase in implementation and runtime complexity. At the moment I suspect there are significantly lower hanging fruit in terms of optimisation...

alamb

This is a very exciting step forward 🎉 Thank you very much @waynexia

I wonder why you chose the name deref rather than store? The underlying cranelift library seems to use load and store which I think are the more common terms in compilers for what Rust terms deref

datafusion/jit/src/api.rs

alamb · 2022-05-23T18:50:48Z

datafusion/jit/src/compile.rs

+        let code_fn =
+            unsafe { core::mem::transmute::<_, fn(i64, i64, i64, i64) -> ()>(code_ptr) };


code like this is some of my favorite ❤️ -- cast some memory to a function pointer and call it ;)

Extremely unsafe but feels very powerful

I wonder why not cast to the types you really want, like fn(*i64, *i64, *i64, i64) and then you can avoid the as i64 stuff below

Good suggestion 👍

Extremely unsafe but feels very powerful

Same!

alamb · 2022-05-23T18:52:00Z

datafusion/jit/src/compile.rs

+
+    use super::*;
+
+    fn run_df_expr(


In the longer term I would like to see this type of logic encapsulated somehow

So we would have a function or struct that took an Expr and several ArrayRefs and then called a JIT or non-JIT version of evaluation depending on flags or options.

Definitely!

I'm going to figure out how to play with var length param list and type casting, and then encapsulate a plan node level interface (maybe take LogicalPlan and data as input, haven't thought in detail) as next step.

alamb · 2022-05-23T18:54:49Z

datafusion/jit/src/compile.rs

+
+    #[test]
+    fn array_add() {
+        let array_a: PrimitiveArray<Int64Type> =


I recommend using different values for array_a and array_b so issues in argument handling would be evident

Like maybe

let array_b: PrimitiveArray<Int64Type> = PrimitiveArray::from_iter_values((10..20).map(|x| x + 1));

Nice catch 😆

alamb · 2022-05-23T18:55:52Z

datafusion/jit/src/compile.rs

+        let expected = r#"fn calc_fn_0(table1.a_array: i64, table1.b_array: i64, result: i64, len: i64) -> () {
+    let index: i64;
+    index = 0;
+    while index < len {


I looked at this code and it looks 👍 to me

viirya · 2022-05-23T20:44:29Z

datafusion/jit/src/compile.rs

+                    format!("{}_ptr", input),
+                    w.add(w.id(format!("{}_array", input))?, w.id("offset")?)?,
+                )?;
+                w.declare_as(input, w.deref(w.id(format!("{}_ptr", input))?, I64)?)?;


Hmm, so the generated func only accepts i64 arrays?

Changed to parameter 😉

viirya · 2022-05-23T20:46:07Z

datafusion/jit/src/compile.rs

+    let mut fn_body = builder.enter_block();
+    fn_body.declare_as("index", fn_body.lit_i(0))?;
+    fn_body.while_block(
+        |cond| cond.lt(cond.id("index")?, cond.id("len")?),


Hmm, do we need sanity check for equal array lengths?

The generated code is working on pointer directly, it might be hard to do these check inside it. I check inputs' length before pass them to generated code at here.

if lhs.len() != rhs.len() { return Err(DataFusionError::NotImplemented( "Computing on different length arrays not yet supported".to_string(), )); }

But I agree that we should consider how to improve safety of generated code when its logic get complicated.

I think in general the idea is that all the safety checks are done during JIT generation as suggested by @waynexia

Add doc for `deref()` and `store()` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

waynexia · 2022-05-24T06:38:15Z

Thanks for all the reviews and information!

I think it is important to understand what cranelift is, and what it isn't. Cranelift is a code generator originally intended to take optimised WASM and convert it to native code. It is not an optimising compiler like LLVM.

That makes sense. I could see a long way to implement this JIT framework and (another long way) to make it outperform the existing interpreter executor 🤣

I wonder why you chose the name deref rather than store? The underlying cranelift library seems to use load and store which I think are the more common terms in compilers for what Rust terms deref

I also have hesitated at the naming, but have now changed it to load().

alamb · 2022-05-24T14:31:29Z

I also agree with @tustvold that in general it will be very hard to beat the performance of using the optimized, vectorized kernels in arrow-rs and in general I think we should continue to use them whenever possible.

As I mentioned in #2587 (comment) I think the super valuable usecase for JIT'ed Exprs is for operations that can't easily (or at all) be vectorized, namely those doing multi column comparisons during Merge / Group

waynexia added 2 commits May 22, 2022 20:33

pipeline expr wrapper

c1e5737

clean up

675092b

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

github-actions bot added the datafusion Changes in the datafusion crate label May 22, 2022

Merge branch 'master' into jit-patch-1

7d38128

Dandandan reviewed May 23, 2022

View reviewed changes

alamb approved these changes May 23, 2022

View reviewed changes

viirya reviewed May 23, 2022

View reviewed changes

waynexia and others added 2 commits May 24, 2022 12:53

Apply suggestions from code review

5634eb9

Add doc for `deref()` and `store()` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

CR improvement: doc, naming and hardcode

cb6c79e

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

alamb approved these changes May 24, 2022

View reviewed changes

alamb merged commit a9cc38a into apache:master May 24, 2022

This was referenced Jun 6, 2022

JIT-Compile DataFusion Expressions to create RecordBatches #2122

Open

[EPIC] JIT support for DataFusion #2703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate JIT'd expression over arrays #2587

Evaluate JIT'd expression over arrays #2587

waynexia commented May 22, 2022 •

edited by alamb

Loading

alamb commented May 22, 2022

waynexia commented May 23, 2022

alamb commented May 23, 2022 •

edited

Loading

Dandandan May 23, 2022

tustvold commented May 23, 2022 •

edited

Loading

alamb left a comment

alamb May 23, 2022

alamb May 23, 2022

waynexia May 24, 2022

alamb May 23, 2022

waynexia May 24, 2022

alamb May 23, 2022

waynexia May 24, 2022

alamb May 23, 2022

viirya May 23, 2022

waynexia May 24, 2022

viirya May 23, 2022

waynexia May 24, 2022

alamb May 24, 2022

waynexia commented May 24, 2022

alamb commented May 24, 2022

		let code_fn =
		unsafe { core::mem::transmute::<_, fn(i64, i64, i64, i64) -> ()>(code_ptr) };

Evaluate JIT'd expression over arrays #2587

Evaluate JIT'd expression over arrays #2587

Conversation

waynexia commented May 22, 2022 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb commented May 22, 2022

waynexia commented May 23, 2022

alamb commented May 23, 2022 • edited Loading

Choose a reason for hiding this comment

tustvold commented May 23, 2022 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waynexia commented May 24, 2022

alamb commented May 24, 2022

waynexia commented May 22, 2022 •

edited by alamb

Loading

alamb commented May 23, 2022 •

edited

Loading

tustvold commented May 23, 2022 •

edited

Loading