-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable recursive search for event logs by default and optional --no-recursion
flag
#1297
Conversation
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
does this properly handles all the different type of eventlogs - directories, databricks directories, normal file eventlogs and a mix of any of those? |
core/src/main/scala/com/nvidia/spark/rapids/tool/EventLogPathProcessor.scala
Show resolved
Hide resolved
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Yes, tested on a directory containing a mixed set of eventlogs (a databricks logs dir, a dir containing 2 event logs, a single event log file and a cluster logs dir containing 3 eventlogs). Example:
Case 1: Recursive search in a mixed logs dir
Output
Note:
Case 2: Recursive search in a mixed logs dir with regex
Output
Case 3: Input directory contains multiple individual event logs
Output
Case 4: Single event log file
Output
|
core/src/main/scala/com/nvidia/spark/rapids/tool/EventLogPathProcessor.scala
Show resolved
Hide resolved
…odifier Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa! LGTM.
Minor note: we need to update documentation as well.
# Conflicts: # core/src/main/scala/com/nvidia/spark/rapids/tool/EventLogPathProcessor.scala # core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualificationMain.scala
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
3bb719c
Fixes #1177. This PR enables recursive search for event logs recursively by default. This is useful when a user wants to process multiple cluster event logs at once.
Example
Before
Currently, to process the above two sets of Databricks event logs, users need to specify each directory level explicitly:
CMD:
After
After the changes in this PR, a user can pass the root directory and the tool will recursively look for valid event logs (files or rolling databricks folders)
CMD:
Changes
--no-recursion
argument that both Python and Scala tools can provide to disable recursion if needed.Code Changes
EventLogPathProcessor#getEventLogInfo()
method to avoid redundant checks and initialization.,getEventLogInfoInternal()
to recursively search for valid event logs. This uses a queue to avoid stack overflow errors.EXCLUDED_EVENTLOG_NAME_KEYWORDS
to filter event logs based on their names.Testing