Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle event logs with wildcards in status report generation #1237

Merged
merged 4 commits into from
Jul 31, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Jul 27, 2024

Fixes #1236. This PR fixes the status report generation in scala, if the event logs are an invalid wildcard path (eg /invalid/event/log/*)

Output

Case 1: Only invalid wildcard event log

  • Event log: /invalid/event/log/*
java $TOOLS_JAR:$SPARK_HOME/jars/* com.nvidia.spark.rapids.tool.qualification.QualificationMain  --platform <platform> /invalid/event/log/*

Previously

  • Status Report is not generated
  • Console Output:
WARN QualificationMain: No event logs to process after checking paths, exiting!

After

  • File: rapids_4_spark_qualification_output_status.csv
Event Log,Status,AppID,Description
"test-file/*","FAILURE","N/A","No event logs found in test-file/*"

Case 2: Mixed event log paths

  • Event log: /test/path/*dsasa*,/Users/psarthi/Work/event-logs/databricks-aws-cpu,/test/path/*dsaassaa**

Previously

rapids_4_spark_qualification_output_status.csv
|--------------------------------------------------------|---------|-------------------------|------------------------|
| Event Log                                              | Status  | AppID                   | Description            |
|--------------------------------------------------------|---------|-------------------------|------------------------|
| file:/Users/psarthi/Work/event-logs/databricks-aws-cpu | SUCCESS | app-20231212214826-0000 | Took 2730ms to process |
|--------------------------------------------------------|---------|-------------------------|------------------------|

After

rapids_4_spark_qualification_output_status.csv
|--------------------------------------------------------|---------|-------------------------|-----------------------------------------------|
| Event Log                                              | Status  | AppID                   | Description                                   |
|--------------------------------------------------------|---------|-------------------------|-----------------------------------------------|
| /test/path/*dsasa*                                     | FAILURE | N/A                     | No event logs found in /test/path/*dsasa*     |
|--------------------------------------------------------|---------|-------------------------|-----------------------------------------------|
| file:/Users/psarthi/Work/event-logs/databricks-aws-cpu | SUCCESS | app-20231212214826-0000 | Took 2882ms to process                        |
|--------------------------------------------------------|---------|-------------------------|-----------------------------------------------|
| /test/path/*dsaassaa**                                 | FAILURE | N/A                     | No event logs found in /test/path/*dsaassaa** |
|--------------------------------------------------------|---------|-------------------------|-----------------------------------------------|

Testing

  • Tested manually above
  • Added unit tests
  • Testes for invalid CSP wildcard paths (gs://invalid/bucket/*)

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Jul 27, 2024
@parthosa parthosa self-assigned this Jul 27, 2024
@parthosa parthosa marked this pull request as ready for review July 27, 2024 00:47
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
This reverts commit 45fd8bd186ac3962fb1bc34be025492554e8cbac.
@parthosa parthosa requested a review from tgravescs July 30, 2024 19:38
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTME
Thanks @parthosa

@parthosa parthosa merged commit 0b65f8b into NVIDIA:dev Jul 31, 2024
14 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1236 branch July 31, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Handle event logs with wildcards in status report generation
2 participants