-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IS_NULL operator to AST #13145
Add IS_NULL operator to AST #13145
Conversation
Join benchmarks comparison (nvbench 23.06 vs this branch)
./_deps/nvbench-src/scripts/nvbench_compare.py ../../branch-23.06/release/before.JOIN_NVBENCH.json ./after.JOIN_NVBENCH.json
['../../branch-23.06/release/before.JOIN_NVBENCH.json', './after.JOIN_NVBENCH.json']
# inner_join_32bit
[0] Quadro GV100
inner_join_64bit[0] Quadro GV100
inner_join_32bit_nulls[0] Quadro GV100
inner_join_64bit_nulls[0] Quadro GV100
left_join_32bit[0] Quadro GV100
left_join_64bit[0] Quadro GV100
left_join_32bit_nulls[0] Quadro GV100
left_join_64bit_nulls[0] Quadro GV100
full_join_32bit[0] Quadro GV100
full_join_64bit[0] Quadro GV100
full_join_32bit_nulls[0] Quadro GV100
full_join_64bit_nulls[0] Quadro GV100
mixed_inner_join_32bit[0] Quadro GV100
mixed_inner_join_64bit[0] Quadro GV100
mixed_inner_join_32bit_nulls[0] Quadro GV100
mixed_inner_join_64bit_nulls[0] Quadro GV100
mixed_left_join_32bit[0] Quadro GV100
mixed_left_join_64bit[0] Quadro GV100
mixed_left_join_32bit_nulls[0] Quadro GV100
mixed_left_join_64bit_nulls[0] Quadro GV100
mixed_full_join_32bit[0] Quadro GV100
mixed_full_join_64bit[0] Quadro GV100
mixed_full_join_32bit_nulls[0] Quadro GV100
mixed_full_join_64bit_nulls[0] Quadro GV100
mixed_left_semi_join_32bit[0] Quadro GV100
mixed_left_semi_join_64bit[0] Quadro GV100
mixed_left_semi_join_32bit_nulls[0] Quadro GV100
mixed_left_semi_join_64bit_nulls[0] Quadro GV100
mixed_left_anti_join_32bit[0] Quadro GV100
mixed_left_anti_join_64bit[0] Quadro GV100
mixed_left_anti_join_32bit_nulls[0] Quadro GV100
mixed_left_anti_join_64bit_nulls[0] Quadro GV100
Summary
|
Join google-benchmark comparison (23.06 vs this branch): Details./_deps/gbench-src/tools/compare.py benchmarks ../../branch-23.06/release/before.JOIN_BENCH.json ./after.JOIN_BENCH.json
|
AST google-benchmark comparison: (23.06 vs this branch) Details./_deps/gbench-src/tools/compare.py benchmarks ../../branch-23.06/release/before.AST_BENCH.json after.AST_BENCH.json
|
Mixed join nvbench has most increase upto 5%. Rest of join kernels are not impacted much. |
Based on our automated microbenchmarking, we see some large improvements to mixed semi and anti joins, and small degradation to mixed full, left and inner joins.
@abellina do you think this branch is worth testing with Spark-RAPIDS NDS? |
Are we sure we trust those benchmark results? Aside from some very strange compiler quirk I don't see how adding a new operator would improve performance. |
Thanks @vyasr for your comment. I trust the results and find them surprising. Here is a breakdown by individual benchmark |
@vyasr @GregoryKimball The AST performance is extremely hard to predict. I'm not surprised by this, but would be interested in knowing more about the kernel's register usage, shared memory usage, occupancy, etc. |
Resource usage of Before: (branch-23.06)Function void cudf::detail::compute_column_kernel<(int)128, (bool)0>(cudf::table_device_view, cudf::ast::detail::expression_device_view, cudf::mutable_column_device_view): After: (This branch)Function void cudf::detail::compute_column_kernel<(int)128, (bool)0>(cudf::table_device_view, cudf::ast::detail::expression_device_view, cudf::mutable_column_device_view): Shared memory didn't change. Register usage and constant usage changed. Mixed join semi kernel with nulls has register usage reduced from 83 to 72. (only kernel where reduction of register usage, similarly benchmarks runtime reduced too) |
OK yeah so basically the compiler did something slightly different due to some internal heuristic that we have no visibility into. Probably not worth any real further consideration here (if we really wanted to we could try tweaking some internal compiler flags to see how it affects the optimizer but I don't think we'd learn anything actionable anyway). |
Yup. Agreed with @vyasr here. What a funny thing for registers to drop from 83 to 72 with the increased complexity. 😆 That explains the perf difference but it’s inscrutable as to why. |
java/src/test/java/ai/rapids/cudf/ast/CompiledExpressionTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but otherwise lgtm.
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small suggestions and questions, overall this looks great though.
/merge |
Description
Add IS_NULL operator in AST.
Adds Python and Java bindings, unit tests.
Checklist