Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate calls to aggregateSparkMetricsBySql #1464

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

amahussein
Copy link
Collaborator

@amahussein amahussein commented Dec 13, 2024

Signed-off-by: Ahmed Hussein (amahussein) a@ahussein.me

Contributes to #1461

AppSparkMetricsAnalyzer was calling aggregateSparkMetricsBySql twice. This code change eleiminates this redundancy to save CPU time and memory allocations.

aggregateSparkMetricsBySql was responsible for more than 53% of total CPU time. This code change cashes the value then pass it to the second method.

Running the same eventlog in the issue description, the performance show the following results:

total CPU: 1,290,033ms -> improved by 24%
total time: 9,096,942 ms -> improved by 23.6%
total allocation: 4.28 TB -> improved by 30.2%

  • getAggregateRawMetrics: CPU Time -> 954,620 (improved by 31%)| 3.93 TB ()
    • aggregateSparkMetricsBySql: 496,760 ms; 35% of total, 48% of parent | 1.86 TB (44% of all; 47% of parent)
    • aggregateSparkMetricsByJob: 496,560 ms; 38% total, 52% of parent | 2.06 TB (48% of all, 53% of parent)

ProfileMain_2024_12_13_194959.zip

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Contributes to NVIDIA#1461

AppSparkMetricsAnalyzer was calling `aggregateSparkMetricsBySql` twice.
This code change eleiminates this redundancy to save CPU time and memory
allocations.
@amahussein amahussein added bug Something isn't working core_tools Scope the core module (scala) labels Dec 13, 2024
@amahussein amahussein self-assigned this Dec 13, 2024
Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein for this. Interesting that we did not catch this before.

@amahussein
Copy link
Collaborator Author

Thanks @amahussein for this. Interesting that we did not catch this before.

Yeah! Amazing how a single line takes costs almost 25% of our runtime.

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein for finding this out!

@amahussein amahussein merged commit a1f866f into NVIDIA:dev Dec 16, 2024
16 checks passed
@amahussein amahussein deleted the rapids-tools-1461-part01 branch December 16, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants