Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stage level metrics output csv file #1251

Merged
merged 2 commits into from
Aug 2, 2024
Merged

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Aug 2, 2024

This fixes #1250.
This bug is because of grouping the metrics only on accumId. This PR fixes the bug by grouping it both on accumId and stageId such that the result is metrics per stage.
Changed the unit test such that the same accumId is updated in 2 stages and the output will show it in 2 different rows.

Also tested it on custom eventlog where the accumulator updates the value in 2 stages.

Before this PR:

appIndex,stageId,accumulatorId,name,min,median,max,total
1,"0",0,"My Accumulator",1,2,4,300
1,"0",1,"internal.metrics.executorDeserializeTime",626,629,631,40238

With this PR:

appIndex,stageId,accumulatorId,name,min,median,max,total
1,0,0,"My Accumulator",1,2,2,100
1,0,1,"internal.metrics.executorDeserializeTime",626,629,631,40238
1,1,0,"My Accumulator",2,4,4,200

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1 nartal1 self-assigned this Aug 2, 2024
@nartal1 nartal1 requested a review from amahussein August 2, 2024 01:13
@nartal1 nartal1 added bug Something isn't working core_tools Scope the core module (scala) labels Aug 2, 2024
@nartal1 nartal1 changed the title Fix stage level metrics report Fix stage level metrics output csv file Aug 2, 2024
@amahussein
Copy link
Collaborator

rapids-tools-pr-1251.patch
Thanks @nartal1 for working on that quick fix
This is a patch to restore the column order.
Since it is stage level report, then it is expected to see the stages coming in order. I also, fixed the type of stageId to be integer because converting it to string turns to be unnecessary overhead from the memory consumption point of view.

The output now looks like this:

appIndex,stageId,accumulatorId,name,min,median,max,total
1,0,0,"My Accumulator",1,2,2,100
1,0,1,"internal.metrics.executorDeserializeTime",626,629,631,40238
1,1,0,"My Accumulator",2,4,4,200

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Aug 2, 2024

rapids-tools-pr-1251.patch Thanks @nartal1 for working on that quick fix This is a patch to restore the column order. Since it is stage level report, then it is expected to see the stages coming in order. I also, fixed the type of stageId to be integer because converting it to string turns to be unnecessary overhead from the memory consumption point of view.

The output now looks like this:

appIndex,stageId,accumulatorId,name,min,median,max,total
1,0,0,"My Accumulator",1,2,2,100
1,0,1,"internal.metrics.executorDeserializeTime",626,629,631,40238
1,1,0,"My Accumulator",2,4,4,200

Thanks @amahussein! The output looks cleaner now. Have applied the patch in this PR.

Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1 for this fix.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1 for the fix.

@nartal1 nartal1 merged commit 4e12f99 into NVIDIA:dev Aug 2, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Stage level metrics output result is incorrect
3 participants