Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update QualX to return default speedups and fix App Duration for incomplete apps #1089

Merged
merged 3 commits into from
Jun 7, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Jun 7, 2024

Fixes #1058,

Issues

Issue 1:

In QualX, we fallback to legacy speedups if the metrics are unavailable (file not found or empty after preprocessing). This PR updates the prediction code to return a speedup of 1 for such apps and logs the reason for missing metrics.

We also introduce a column wasPredicted in per_app.csv as marker for apps that could not be predicted.

Affects:

predict()

Issue 2:

In QualX, CSV metrics from the profiling tool does not have app duration for incomplete applications. Qualification tool provides an estimated app duration for these.

This PR updates QuaX to replace the incorrect app duration in CSV metrics with the estimated duration from the qualification tool output.

Affects:

train(), compare() and predict()

Output:


CASE 1: No supported stages for all apps in the dataset(in this case, single eventlog)

WARNING spark_rapids_tools.tools.qualx.preprocess: Predicted speedup will be 1.0 for application_171615xxxx. Reason: No fully supported stages found.
WARNING spark_rapids_tools.tools.qualx.qualx_main: Predicted speedup will be 1.0 for dataset: qual_20240607xxxx. Check logs for details.

CASE 2: Metrics unavailable for all apps in the dataset(in this case, single eventlog)

WARNING spark_rapids_tools.tools.qualx.preprocess: Predicted speedup will be 1.0 for application_1715312822xxx. Reason: Empty feature tables found after preprocessing: application_information, sql_plan_metrics_for_application, job_+_stage_level_aggregated_task_metrics.
WARNING spark_rapids_tools.tools.qualx.qualx_main: Predicted speedup will be 1.0 for dataset: qual_202406071648xxx. Check logs for details.

CASE 3: Metrics unavailable for some apps in the dataset (cannot calculate exact reason, showing a broad reason):

WARNING spark_rapids_tools.tools.qualx.preprocess: Predicted speedup will be 1.0 for application_1715312822xxx, application_1715312822xxx. Reason: Missing features after preprocessing.

Predicted CSV File:

per_app.csv

|------------------------------|----------------------------|-------------|----------|---------------|--------------------|--------------------|------------------|-------------------|--------------|
| appName                      | appId                      | appDuration | Duration | Duration_pred | Duration_supported | fraction_supported | appDuration_pred | speedup           | wasPredicted |
|------------------------------|----------------------------|-------------|----------|---------------|--------------------|--------------------|------------------|-------------------|--------------|
| qual_20240607155643_e91fB6D3 | application_1686676198xxxx |      887621 |   820739 |        116175 |             820739 | 0.9246502730331977 |           183057 | 4.848855613929893 | True         |
|------------------------------|----------------------------|-------------|----------|---------------|--------------------|--------------------|------------------|-------------------|--------------|
| NDS - Power Run              | application_1715312822xxxx |       46911 |        0 |             0 |                  0 |                0.0 |            46911 |               1.0 | False        |
|------------------------------|----------------------------|-------------|----------|---------------|--------------------|--------------------|------------------|-------------------|--------------|
| NDS - Power Run              | application_1715312822xxxx |       30507 |        0 |             0 |                  0 |                0.0 |            30507 |               1.0 | False        |
|------------------------------|----------------------------|-------------|----------|---------------|--------------------|--------------------|------------------|-------------------|--------------|

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Jun 7, 2024
@parthosa parthosa self-assigned this Jun 7, 2024
@parthosa parthosa marked this pull request as ready for review June 7, 2024 17:35
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
leewyang
leewyang previously approved these changes Jun 7, 2024
Copy link
Collaborator

@leewyang leewyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa changed the title Update QualX to return default speedup of 1 with reason Update QualX to return default speedup of 1 and fix App Duration for incomplete apps Jun 7, 2024
@parthosa parthosa changed the title Update QualX to return default speedup of 1 and fix App Duration for incomplete apps Update QualX to return default speedups and fix App Duration for incomplete apps Jun 7, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa
LGTME!

@amahussein amahussein merged commit 2858a3a into NVIDIA:dev Jun 7, 2024
15 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1058 branch October 9, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Prediction mode should return speedup 1.0 instead of FallingBack to legacy Speedups
3 participants