Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Add sort merge join metrics #3920

Merged
merged 3 commits into from
Dec 5, 2023
Merged

[VL] Add sort merge join metrics #3920

merged 3 commits into from
Dec 5, 2023

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

This pr adds metrics for sort merge join. Since join always has post project, unfiy the post project metrics for hash join and sort merge join.

How was this patch tested?

Add test and test manually

image

Copy link

github-actions bot commented Dec 5, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

"postProjectionOutputVectors" -> SQLMetrics.createMetric(
sparkContext,
"number of postProjection output vectors"),
"finalOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of final output rows"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's nerver used before, and replaced to numOutputRows in this pr

@ulysses-you
Copy link
Contributor Author

cc @PHILO-HE @JkSelf thank you

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you Thanks for your great work. Just two small questions.

"bufferPreProjectionCpuCount" -> SQLMetrics.createMetric(
sparkContext,
"buffer preProject cpu wall time count"),
"bufferPreProjectionWallNanos" -> SQLMetrics.createNanoTimingMetric(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you What are the differences between stream and buffer? How can they be mapped to the left and right in a sort merge join?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge join only supports inner and left outer, so the steam side is always left child and the buffer side is always right child. It's used to show the metrics pre-project for stream/buffer side.

super.afterAll()
}

test("test sort merge join metrics") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you Do we need to add the test with post project? And also add the hash join test for both with pre project and post project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join always has the post project. The numOutputRows/numOutputVectors metrics are from post project.

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@ulysses-you ulysses-you merged commit f31cc82 into apache:main Dec 5, 2023
16 of 17 checks passed
@ulysses-you ulysses-you deleted the smj branch December 5, 2023 11:28
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3920_time.csv log/native_master_12_04_2023_94c91c55d_time.csv difference percentage
q1 35.55 34.65 -0.903 97.46%
q2 24.82 24.91 0.081 100.33%
q3 38.17 36.38 -1.789 95.31%
q4 37.15 37.37 0.221 100.59%
q5 71.94 72.63 0.694 100.96%
q6 7.05 6.87 -0.178 97.47%
q7 85.56 85.44 -0.128 99.85%
q8 87.79 87.86 0.076 100.09%
q9 120.84 124.52 3.677 103.04%
q10 45.94 46.04 0.099 100.22%
q11 21.56 20.12 -1.432 93.36%
q12 25.30 26.71 1.408 105.56%
q13 45.68 46.45 0.775 101.70%
q14 18.15 14.55 -3.602 80.15%
q15 29.16 28.16 -1.005 96.55%
q16 15.23 15.75 0.520 103.41%
q17 102.40 103.10 0.706 100.69%
q18 149.75 150.63 0.885 100.59%
q19 12.81 12.90 0.089 100.70%
q20 27.56 27.73 0.169 100.61%
q21 224.67 222.80 -1.875 99.17%
q22 13.55 13.06 -0.494 96.35%
total 1240.64 1238.63 -2.008 99.84%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants