Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calculation of unsupported operators stage duration percentage #1006

Merged
merged 12 commits into from
May 22, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented May 9, 2024

Fixes #1003. This PR fixes the unsupported stage duration calculation by using SQL Stage Durations Sum as denominator instead of App Duration.

Additionally, handle the case when event logs do not have App Name.

Changes:

  1. Add column SQL Stage Durations Sum in rapids_4_spark_qualification_output.csv.
  2. Group apps by app id and sql stage durations sum and calculate unsupported operators stage duration.
  3. Merge this dataframe with all_apps and calculated percentage as unsupported stage duration / sql stage durations sum.
  4. Unit Test Changes - Add new column in unit tests expected files.

Design

  1. We added Total Stage Duration in the main rapids_4_spark_qualification_output.csv instead of rapids_4_spark_qualification_output_unsupportedOperators.csv since the second file might not have entries for all apps (eg apps without any unsupported operators). To handle this we would have to add more fillna(), division by NaN etc.
  2. This is a cleaner approach that avoids the above handling.
  3. The con was that we had to update the expected files for unit tests with new column.

Test

CMD (need to specify dev JAR):

spark_rapids qualification <eventlogs>  --tools_jar <tools_jar> 
  • Tested on multiple event logs with various cases:
    • App with no unsupported operator entries
    • App with unsupported entries that have no impact (hence would be filtered before calculation)
    • Two events logs having same app name - To test TCO behaviour.

Output

  • Showing the result along with Spill Heuristics:

image

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels May 9, 2024
@parthosa parthosa self-assigned this May 9, 2024
@parthosa parthosa marked this pull request as draft May 15, 2024 21:19
@parthosa parthosa changed the title Cap unsupported ops stage duration percentage Fix calculation of unsupported operators stage duration percentage May 16, 2024
@parthosa parthosa marked this pull request as ready for review May 16, 2024 22:51
@parthosa parthosa added the core_tools Scope the core module (scala) label May 16, 2024
# Conflicts:
#	user_tools/src/spark_rapids_pytools/resources/qualification-conf.yaml
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa
I was not expecting a change in the Scala side.
Lets sync offline to better understanding the recent changes.

amahussein
amahussein previously approved these changes May 17, 2024
cindyyuanjiang
cindyyuanjiang previously approved these changes May 18, 2024
@amahussein
Copy link
Collaborator

@parthosa Just double checking.
I don't see the column title updated in the last 4 days. Did you commit your latest changes?

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
# Conflicts:
#	core/src/test/resources/QualificationExpectations/jdbc_expectation.csv
#	core/src/test/resources/QualificationExpectations/read_dsv1_expectation.csv
#	core/src/test/resources/QualificationExpectations/read_dsv2_expectation.csv
#	core/src/test/resources/QualificationExpectations/write_format_expectation.csv
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa
Copy link
Collaborator Author

I don't see the column title updated in the last 4 days. Did you commit your latest changes?

@amahussein Renamed the column to SQL Stage Durations Sum

amahussein
amahussein previously approved these changes May 21, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that it turned to be more painful than it is originally thought :)
Thanks @parthosa !
LGTME.

cindyyuanjiang
cindyyuanjiang previously approved these changes May 22, 2024
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa dismissed stale reviews from cindyyuanjiang and amahussein via 22dd2ce May 22, 2024 00:34
@parthosa parthosa merged commit 289cfb2 into NVIDIA:dev May 22, 2024
15 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1003 branch May 22, 2024 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala) user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Unsupported operators stage duration percentage should be capped
5 participants