Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update join to use experimental row hasher and comparator #12787

Merged
merged 45 commits into from
Apr 6, 2023

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Feb 16, 2023

Description

Part of #11844. I will create a separate PR for mixed_join.

Compilation times:
main 94bbc82 : 16m47.513s
This PR 5d75db8 : 16m47.520s

Benchmarks: #12787 (comment)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@divyegala divyegala added feature request New feature or request non-breaking Non-breaking change labels Feb 16, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 16, 2023
@divyegala
Copy link
Member Author

divyegala commented Feb 17, 2023

benchmarks

```

inner_join_32bit

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I32 I32 0 100000 100000 601.928 us 34.41% 605.550 us 25.15% 3.622 us 0.60% PASS
I32 I32 0 100000 400000 1.012 ms 22.02% 1.028 ms 18.47% 16.543 us 1.64% PASS
I32 I32 0 10000000 10000000 15.605 ms 39.30% 15.344 ms 2.23% -260.419 us -1.67% PASS
I32 I32 0 10000000 40000000 41.655 ms 1.13% 41.671 ms 0.56% 16.208 us 0.04% PASS
I32 I32 0 10000000 100000000 96.049 ms 0.22% 96.008 ms 0.50% -41.437 us -0.04% PASS
I32 I32 0 80000000 100000000 133.651 ms 0.06% 135.866 ms 0.45% 2.215 ms 1.66% FAIL
I32 I32 0 100000000 100000000 144.730 ms 0.06% 147.588 ms 0.50% 2.858 ms 1.97% FAIL
I32 I32 0 10000000 240000000 224.828 ms 0.15% 224.845 ms 0.22% 16.814 us 0.01% PASS
I32 I32 0 80000000 240000000 257.738 ms 0.39% 256.600 ms 0.16% -1137.812 us -0.44% FAIL
I32 I32 0 100000000 240000000 268.043 ms 0.33% 268.213 ms 0.11% 169.866 us 0.06% PASS

inner_join_64bit

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I64 I64 0 40000000 50000000 68.415 ms 1.46% 69.047 ms 1.38% 631.858 us 0.92% PASS
I64 I64 0 50000000 50000000 74.136 ms 0.05% 75.408 ms 0.09% 1.272 ms 1.72% FAIL
I64 I64 0 40000000 120000000 129.803 ms 0.04% 130.125 ms 0.10% 322.181 us 0.25% FAIL
I64 I64 0 50000000 120000000 135.795 ms 0.11% 136.195 ms 2.21% 400.698 us 0.30% FAIL

inner_join_32bit_nulls

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I32 I32 1 100000 100000 218.837 us 4.56% 209.993 us 4.22% -8.843 us -4.04% PASS
I32 I32 1 100000 400000 639.186 us 10.09% 642.582 us 10.41% 3.396 us 0.53% PASS
I32 I32 1 10000000 10000000 6.405 ms 1.60% 6.675 ms 1.44% 270.004 us 4.22% FAIL
I32 I32 1 10000000 40000000 16.520 ms 56.23% 16.343 ms 1.30% -177.517 us -1.07% PASS
I32 I32 1 10000000 100000000 39.141 ms 11.59% 38.851 ms 12.71% -290.449 us -0.74% PASS
I32 I32 1 80000000 100000000 53.277 ms 1.05% 55.347 ms 1.05% 2.070 ms 3.89% FAIL
I32 I32 1 100000000 100000000 57.844 ms 0.24% 60.330 ms 0.34% 2.487 ms 4.30% FAIL
I32 I32 1 10000000 240000000 94.244 ms 11.65% 94.255 ms 9.51% 10.841 us 0.01% PASS
I32 I32 1 80000000 240000000 97.470 ms 4.67% 98.468 ms 6.83% 997.823 us 1.02% PASS
I32 I32 1 100000000 240000000 101.820 ms 0.92% 104.596 ms 0.27% 2.776 ms 2.73% FAIL

inner_join_64bit_nulls

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I64 I64 1 40000000 50000000 27.807 ms 1.04% 28.668 ms 1.09% 861.187 us 3.10% FAIL
I64 I64 1 50000000 50000000 30.459 ms 17.15% 31.381 ms 0.26% 922.373 us 3.03% FAIL
I64 I64 1 40000000 120000000 49.082 ms 0.41% 50.550 ms 0.44% 1.468 ms 2.99% FAIL
I64 I64 1 50000000 120000000 52.173 ms 0.50% 53.706 ms 0.50% 1.533 ms 2.94% FAIL

left_join_32bit

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I32 I32 0 100000 100000 568.085 us 14.01% 562.395 us 10.83% -5.690 us -1.00% PASS
I32 I32 0 100000 400000 1.012 ms 6.12% 1.006 ms 7.81% -5.551 us -0.55% PASS
I32 I32 0 10000000 10000000 16.080 ms 1.09% 15.663 ms 0.93% -417.682 us -2.60% FAIL
I32 I32 0 10000000 40000000 42.125 ms 0.50% 44.239 ms 30.20% 2.113 ms 5.02% FAIL
I32 I32 0 10000000 100000000 96.275 ms 0.42% 97.245 ms 0.04% 970.038 us 1.01% FAIL
I32 I32 0 80000000 100000000 140.800 ms 0.10% 139.483 ms 0.19% -1316.986 us -0.94% FAIL
I32 I32 0 100000000 100000000 154.213 ms 0.06% 152.440 ms 0.28% -1772.779 us -1.15% FAIL
I32 I32 0 10000000 240000000 225.130 ms 0.17% 228.952 ms 0.15% 3.822 ms 1.70% FAIL
I32 I32 0 80000000 240000000 261.704 ms 0.08% 261.985 ms 0.43% 281.059 us 0.11% FAIL
I32 I32 0 100000000 240000000 275.129 ms 0.06% 274.692 ms 0.15% -437.603 us -0.16% FAIL

left_join_64bit

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I64 I64 0 40000000 50000000 72.193 ms 1.28% 70.651 ms 1.73% -1541.946 us -2.14% FAIL
I64 I64 0 50000000 50000000 79.152 ms 0.09% 76.920 ms 0.43% -2231.237 us -2.82% FAIL
I64 I64 0 40000000 120000000 132.821 ms 0.05% 130.747 ms 0.17% -2073.968 us -1.56% FAIL
I64 I64 0 50000000 120000000 140.069 ms 0.41% 137.250 ms 0.21% -2819.150 us -2.01% FAIL

left_join_32bit_nulls

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I32 I32 1 100000 100000 590.385 us 10.14% 593.121 us 12.14% 2.735 us 0.46% PASS
I32 I32 1 100000 400000 987.878 us 9.05% 1.004 ms 6.94% 16.478 us 1.67% PASS
I32 I32 1 10000000 10000000 7.812 ms 1.62% 7.667 ms 2.25% -144.954 us -1.86% FAIL
I32 I32 1 10000000 40000000 17.380 ms 13.17% 17.400 ms 14.95% 19.956 us 0.11% PASS
I32 I32 1 10000000 100000000 42.405 ms 19.50% 42.329 ms 20.21% -75.611 us -0.18% PASS
I32 I32 1 80000000 100000000 58.753 ms 0.64% 57.478 ms 2.74% -1274.106 us -2.17% FAIL
I32 I32 1 100000000 100000000 65.009 ms 0.13% 63.412 ms 0.50% -1597.927 us -2.46% FAIL
I32 I32 1 10000000 240000000 95.731 ms 9.96% 96.333 ms 9.28% 601.214 us 0.63% PASS
I32 I32 1 80000000 240000000 112.574 ms 7.89% 112.793 ms 5.87% 219.106 us 0.19% PASS
I32 I32 1 100000000 240000000 117.236 ms 7.88% 116.327 ms 3.74% -908.963 us -0.78% PASS

left_join_64bit_nulls

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I64 I64 1 40000000 50000000 30.629 ms 1.11% 29.789 ms 1.11% -839.707 us -2.74% FAIL
I64 I64 1 50000000 50000000 33.898 ms 0.35% 32.953 ms 0.93% -944.659 us -2.79% FAIL
I64 I64 1 40000000 120000000 55.075 ms 28.23% 54.284 ms 11.17% -791.124 us -1.44% PASS
I64 I64 1 50000000 120000000 58.418 ms 2.69% 58.167 ms 13.62% -251.058 us -0.43% PASS

full_join_32bit

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I32 I32 0 100000 100000 1.234 ms 14.60% 1.072 ms 14.33% -161.793 us -13.11% PASS
I32 I32 0 100000 400000 3.176 ms 11.43% 2.772 ms 6.31% -404.880 us -12.75% FAIL
I32 I32 0 10000000 10000000 21.773 ms 2.25% 21.550 ms 48.05% -223.275 us -1.03% PASS
I32 I32 0 10000000 40000000 57.261 ms 0.70% 56.178 ms 0.50% -1082.649 us -1.89% FAIL
I32 I32 0 10000000 100000000 130.521 ms 0.35% 129.071 ms 0.27% -1449.892 us -1.11% FAIL
I32 I32 0 80000000 100000000 178.917 ms 0.19% 175.327 ms 0.04% -3590.888 us -2.01% FAIL
I32 I32 0 100000000 100000000 193.614 ms 0.23% 189.606 ms 0.12% -4008.029 us -2.07% FAIL
I32 I32 0 10000000 240000000 305.694 ms 0.29% 304.483 ms 0.08% -1211.204 us -0.40% FAIL
I32 I32 0 80000000 240000000 347.551 ms 0.50% 341.301 ms 0.42% -6250.216 us -1.80% FAIL
I32 I32 0 100000000 240000000 369.531 ms 4.50% 364.988 ms 10.22% -4542.958 us -1.23% PASS

full_join_64bit

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I64 I64 0 40000000 50000000 99.998 ms 2.64% 92.127 ms 21.14% -7871.118 us -7.87% FAIL
I64 I64 0 50000000 50000000 102.274 ms 4.30% 97.881 ms 0.60% -4393.338 us -4.30% FAIL
I64 I64 0 40000000 120000000 182.186 ms 12.38% 174.636 ms 0.19% -7549.630 us -4.14% FAIL
I64 I64 0 50000000 120000000 186.192 ms 2.67% 181.895 ms 0.49% -4297.112 us -2.31% FAIL

full_join_32bit_nulls

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I32 I32 1 100000 100000 1.079 ms 8.50% 1.193 ms 10.73% 113.657 us 10.53% FAIL
I32 I32 1 100000 400000 2.764 ms 6.32% 2.776 ms 7.11% 11.746 us 0.42% PASS
I32 I32 1 10000000 10000000 12.638 ms 70.86% 12.646 ms 27.37% 8.341 us 0.07% PASS
I32 I32 1 10000000 40000000 40.158 ms 33.27% 40.097 ms 33.76% -60.952 us -0.15% PASS
I32 I32 1 10000000 100000000 94.469 ms 38.02% 94.048 ms 30.80% -420.948 us -0.45% PASS
I32 I32 1 80000000 100000000 121.447 ms 29.27% 121.604 ms 30.61% 156.622 us 0.13% PASS
I32 I32 1 100000000 100000000 129.554 ms 33.72% 129.316 ms 38.61% -237.795 us -0.18% PASS
I32 I32 1 10000000 240000000 218.891 ms 43.10% 218.816 ms 4.10% -74.498 us -0.03% PASS
I32 I32 1 80000000 240000000 245.701 ms 47.03% 246.578 ms 38.45% 876.893 us 0.36% PASS
I32 I32 1 100000000 240000000 258.138 ms 55.77% 253.085 ms 46.94% -5052.850 us -1.96% PASS

full_join_64bit_nulls

[0] Tesla V100-SXM2-32GB

Key Type Payload Type Nullable Build Table Size Probe Table Size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I64 I64 1 40000000 50000000 60.060 ms 16.40% 60.716 ms 27.98% 656.930 us 1.09% PASS
I64 I64 1 50000000 50000000 65.049 ms 22.26% 64.840 ms 26.60% -209.297 us -0.32% PASS
I64 I64 1 40000000 120000000 123.631 ms 18.71% 124.044 ms 37.14% 412.905 us 0.33% PASS
I64 I64 1 50000000 120000000 128.418 ms 26.05% 128.256 ms 20.83% -161.136 us -0.13% PASS
</p></details>

Comment on lines 509 to 511
auto preprocessed_probe =
cudf::experimental::row::equality::preprocessed_table::create(probe_table, stream);
auto join_indices = cudf::detail::probe_join_hash_table(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_indices cannot be const because being called into detail::concatenate_vector_pairs which does not accept const

cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join.cu Outdated Show resolved Hide resolved
cpp/src/join/mixed_join.cu Outdated Show resolved Hide resolved
@divyegala divyegala requested a review from ttnghia March 28, 2023 20:11
cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Mar 29, 2023

I had a couple comments that we discussed on Slack -- @ttnghia since it seems you've been a more active reviewer, I'll let you be the second C++ approval on this PR.

@divyegala divyegala changed the base branch from branch-23.04 to branch-23.06 March 30, 2023 18:29
cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
@harrism harrism removed their request for review April 4, 2023 00:07
@divyegala
Copy link
Member Author

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants