-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification tool: Error handling while processing large event logs #3714
Merged
Merged
Changes from 2 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
a7a8d32
catch OOM error while processing large files
nartal1 45fbe99
addressed review comments
nartal1 c5deee6
moved error handling to thread level
nartal1 8e54a42
Merge branch 'branch-21.12' of github.com:NVIDIA/spark-rapids into sy…
nartal1 334616b
addressed review comments
nartal1 318d10b
function for error handling
nartal1 b88f87e
addressed nits
nartal1 a5f5d34
Merge branch 'branch-21.12' of github.com:NVIDIA/spark-rapids into sy…
nartal1 4796f1e
Merge branch 'branch-21.12' of github.com:NVIDIA/spark-rapids into sy…
nartal1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the root cause of OOM is likely the fact that we materialize the whole file on Heap. Remove
toList
to keep it a line iterator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gerashegalov for taking a look. I was still seeing OOM error even after removing
toList
as we are reading one event log per thread. I am wrapping the checks at thread level now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are multiple things that can cause OOM, Gera is just saying this is potentially one of those, it depends on the file sizes. we should file a separate followup if we want to optimize it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post the stack trace, it might give us a hint what to look for?
And I'd get a heapdump with
-XX:+HeapDumpOnOutOfMemoryError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is the stack trace. It looks like while reading the file from inputStream.
I got the heapdump by including the argument you had specified. I opened it with jhat. It's too much info in there. Could you please let me pointers about where I should be looking at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you still have toList in there.
VisualVM (part of JDK) and Eclipse MAT are more user friendly for analyzing the heap dump. You want to look for objects with large "retained" heap size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Gera. Filed follow on issue to improve memory consumption: https://github.com/NVIDIA/spark-rapids/issues/3727
This PR is mostly to identify OOM and throw a meaningful error so that users can increase the heap size.