Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable recursive search for event logs by default and optional --no-recursion flag #1297

Merged
merged 8 commits into from
Aug 20, 2024

Conversation

parthosa
Copy link
Collaborator

Fixes #1177. This PR enables recursive search for event logs recursively by default. This is useful when a user wants to process multiple cluster event logs at once.

Example

test_cluster_logs/
├── 0325-072137-br2ztcsl
│   ├── driver
|   ├── eventlog
│   │   └── 0325-072137-br2ztcsl_100_64_30_156
│   │       └── 5801510396587970993
│   │           ├── eventlog
│   │           ├── eventlog-2024-03-25--07-30.gz
│   │           ├── eventlog-2024-03-25--07-40.gz
│   │           ├── eventlog-2024-03-25--07-50.gz
│   │           ├── eventlog-2024-03-25--08-00.gz
│   │           ├── eventlog-2024-03-25--08-10.gz
│   │           ├── eventlog-2024-03-25--08-20.gz
│   │           ├── eventlog-2024-03-25--08-30.gz
│   │           ├── eventlog-2024-03-25--08-40.gz
│   │           └── eventlog-2024-03-25--08-50.gz
│   ├── executor
│   └── init_scripts
└── 0325-092658-o0t9uwq4
    ├── driver
    ├── eventlog
    │   └── 0325-085253-u3t0qaxu_100_64_30_156
    │       └── 5160740748072535114
    │           ├── eventlog
    │           ├── eventlog-2024-03-25--09-00.gz
    │           ├── eventlog-2024-03-25--09-10.gz
    │           └── eventlog-2024-03-25--09-20.gz
    ├── executor
    └── init_scripts

Before

Currently, to process the above two sets of Databricks event logs, users need to specify each directory level explicitly:

CMD:

spark_rapids qualification --eventlogs /path/to/test_cluster_logs/*/eventlog/*/*

After

After the changes in this PR, a user can pass the root directory and the tool will recursively look for valid event logs (files or rolling databricks folders)

CMD:

spark_rapids qualification --eventlogs /path/to/test_cluster_logs

Changes

  • Enabled recursive search by default.
  • Added a --no-recursion argument that both Python and Scala tools can provide to disable recursion if needed.
    • This maybe useful when using CSP storage.
  • Wildcard handling:
    • First, we process wildcards to list all matching paths.
    • Then, perform a recursive search for each listed path.

Code Changes

  • Started using @tgravescs branch as a reference.
  • Simplified the EventLogPathProcessor#getEventLogInfo() method to avoid redundant checks and initialization.,
  • Introduced a method getEventLogInfoInternal() to recursively search for valid event logs. This uses a queue to avoid stack overflow errors.
  • Introduced EXCLUDED_EVENTLOG_NAME_KEYWORDS to filter event logs based on their names.

Testing

  • Tested the changes manually using a mixed set of paths

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added feature request New feature or request core_tools Scope the core module (scala) labels Aug 17, 2024
@parthosa parthosa self-assigned this Aug 17, 2024
@tgravescs
Copy link
Collaborator

does this properly handles all the different type of eventlogs - directories, databricks directories, normal file eventlogs and a mix of any of those?

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa
Copy link
Collaborator Author

parthosa commented Aug 19, 2024

does this properly handles all the different type of eventlogs - directories, databricks directories, normal file eventlogs and a mix of any of those?

Yes, tested on a directory containing a mixed set of eventlogs (a databricks logs dir, a dir containing 2 event logs, a single event log file and a cluster logs dir containing 3 eventlogs).

Example:

/path/mixed-logs
├── databricks_logs
│   ├── eventlog
│   └── eventlog-2024-04-22--07-30.gz
├── dir_with_eventlogs
│   ├── application_1692643187882_0001
│   └── query71-eventlog
├── single_eventlog_file
└── databricks_cluster_logs
    ├── 0325-072137-br2ztcsl
    │   ├── driver
    │   ├── eventlog
    │   ├── executor
    │   └── init_scripts
    ├── 0325-085253-u3t0qaxu
    │   ├── driver
    │   ├── eventlog
    │   ├── executor
    │   └── init_scripts
    └── 0325-092658-o0t9uwq4
        ├── driver
        ├── eventlog
        ├── executor
        └── init_scripts

Case 1: Recursive search in a mixed logs dir

spark_rapids qualification --eventlogs /path/mixed-logs --tools_jar $TOOLS_JAR

Output

Report Summary:
----------------------  -
Total applications      7
Processed applications  6
Top candidates          2
----------------------  -

Note:

  • One of event logs is a GPU event log hence not processed.

Case 2: Recursive search in a mixed logs dir with regex

spark_rapids qualification --eventlogs "/path/mixed-logs/da*" --tools_jar $TOOLS_JAR

Output

Report Summary:
----------------------  -
Total applications      4
Processed applications  3
Top candidates          1
----------------------  -

Case 3: Input directory contains multiple individual event logs

spark_rapids qualification --eventlogs "/path/mixed-logs/dir_with_eventlogs" --tools_jar $TOOLS_JAR --no-recursion

Output

Report Summary:
----------------------  -
Total applications      2
Processed applications  2
Top candidates          0
----------------------  -

Case 4: Single event log file

spark_rapids qualification --eventlogs "/path/mixed-logs/single_eventlog_file" --tools_jar $TOOLS_JAR 

Output

Report Summary:
----------------------  -
Total applications      1
Processed applications  1
Top candidates          1
----------------------  -

…odifier

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
cindyyuanjiang
cindyyuanjiang previously approved these changes Aug 20, 2024
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa! LGTM.
Minor note: we need to update documentation as well.

tgravescs
tgravescs previously approved these changes Aug 20, 2024
# Conflicts:
#	core/src/main/scala/com/nvidia/spark/rapids/tool/EventLogPathProcessor.scala
#	core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualificationMain.scala
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa dismissed stale reviews from tgravescs and cindyyuanjiang via 3bb719c August 20, 2024 18:03
@parthosa parthosa merged commit 055f088 into NVIDIA:dev Aug 20, 2024
14 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1177 branch August 20, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Enable recursive lookup for event log paths by default
3 participants