Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Catch Profiler error when app info is empty #994

Merged
merged 1 commit into from
May 7, 2024

Conversation

cindyyuanjiang
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang commented May 4, 2024

Fixes #947

The Profiler could fail when: it successfully creates an ApplicationInfo in createApp, when processing this app in processApps, the Profiler found that it is unable to collect information from this app in getAppInfo. This leads to an empty appInfo in

which fails
logInfo(s"Took ${endTime - startTime}ms to Process [${appInfo.head.appId}]")

This PR catches the error early and prints a more helpful message.

Signed-off-by: cindyyuanjiang <cindyj@nvidia.com>
parthosa
parthosa previously approved these changes May 4, 2024
@cindyyuanjiang cindyyuanjiang self-assigned this May 4, 2024
@cindyyuanjiang cindyyuanjiang added bug Something isn't working core_tools Scope the core module (scala) labels May 4, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cindyyuanjiang !
QQ: since we are trying to reduce the gap betwen Q/P tools, How feasible do you think it is to handle PRofiling tool the same way we do in Q tool? For example, generate a file like rapids_4_spark_qualification_output_status.csv reporting the status of processing of each eventlog ?
If you find it an easy change, then it will be nice to handle it in this PR.

@parthosa do you recall if we have an exiting issue to generate status report for Profiling tool?

@parthosa
Copy link
Collaborator

parthosa commented May 6, 2024

@parthosa do you recall if we have an exiting issue to generate status report for Profiling tool?

We do not have any existing issue to generate status report for Profiling Tool.

For generating status report in Profiling tool, #464 could be used as reference.

@parthosa parthosa self-requested a review May 6, 2024 17:18
@parthosa parthosa dismissed their stale review May 6, 2024 17:18

Revisiting changes

@cindyyuanjiang
Copy link
Collaborator Author

QQ: since we are trying to reduce the gap betwen Q/P tools, How feasible do you think it is to handle PRofiling tool the same way we do in Q tool? For example, generate a file like rapids_4_spark_qualification_output_status.csv reporting the status of processing of each eventlog ? If you find it an easy change, then it will be nice to handle it in this PR.

Thanks @amahussein!
I think it is feasible to generate a status report for Profiling tool. I am taking a look into that.

@cindyyuanjiang
Copy link
Collaborator Author

Filed issue to track adding status report for profiling tool: #998, will add feature in a new PR.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: @cindyyuanjiang Did you test with multiple eventlog that throwing RuntimeExcpetion won't cause the entire Profiler app to Exit?
We want to make sure that runtimeExcprion won't crash the execution and cause the processing of other apps to be terminated.

@cindyyuanjiang
Copy link
Collaborator Author

cindyyuanjiang commented May 7, 2024

QQ: @cindyyuanjiang Did you test with multiple eventlog that throwing RuntimeExcpetion won't cause the entire Profiler app to Exit? We want to make sure that runtimeExcprion won't crash the execution and cause the processing of other apps to be terminated.

Thanks @amahussein! I tested with multiple event logs and confirm that change in this PR would not crash the Profiler. The Profiler would continue to run until all apps are processed.

The eventlog which throws RuntimeException is marked as failed as in the progress bar:
Profile Tool Progress 100% [==========================================] (1 succeeded + 1 failed + 0 skipped + 0 N/A) / 2

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amahussein amahussein merged commit fa5fbea into NVIDIA:dev May 7, 2024
16 checks passed
@cindyyuanjiang cindyyuanjiang deleted the spark-rapids-tools-947 branch May 7, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Profiler throws NoSuchElementException
3 participants