Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Qualification tool for better performance #2822

Merged
merged 59 commits into from
Jun 29, 2021

Conversation

tgravescs
Copy link
Collaborator

This PR rewrites parts of the qualification tool. I kept the overall structure very similar so that code could be reused with the Profiling tool.

overall this is what this does:

  • removes requirement to run inside of Spark (ie no more Spark SQL Dataframe operations for aggregations), now you can just run via java. It still does require the SPARK jars to be present though.
  • Changes output to output both a text summary report with only 4 columns and a CSV file with the raw data that has more info
  • reorganizes the parsing code to aggregate things as we read the events whenever possible.
  • New QualAppInfo class that is specific to Qualification, removed pieces from ApplicationInfo that were Qualification specific.
  • Only keep the summary of each app in memory, individual app data is thrown away so we don't require a lot of memory
  • refactors event processor and ApplicationInfo to have base classes that could be shared between profiling and qualification tools
  • added in another columns for sql ids that had failed jobs in them.
  • added field SQL Duration with Potential Problems that is the total duration for any SQL queries that we found problematic - ie UDFS right now

This takes significantly less time and memory now to process event logs. The time it takes is really close to just the time it takes Spark to read and parse the events.

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert on the Qualifying tool but the changes look good to me. The only thing that might be interesting to explore is putting in a thread pool, like the Spark history server uses. That way if there are more than one application to look at for the qualification tool, it can all be done in parallel (up to a set thread pool size).

@tgravescs
Copy link
Collaborator Author

test failure because not upmerged

@tgravescs
Copy link
Collaborator Author

thanks Bobby, definitely a good idea. I'll do that in a followup

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

@revans2 are you ok with this being merged (if so can you approve)?

revans2
revans2 previously approved these changes Jun 28, 2021
@tgravescs
Copy link
Collaborator Author

build

gerashegalov
gerashegalov previously approved these changes Jun 28, 2021
tools/pom.xml Outdated
@@ -41,6 +41,7 @@
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<scope>compile,runtime</scope>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the compile scope is a superset of the runtime scope. As written, this is equivalent to just compile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

event match {
case _: SparkListenerResourceProfileAdded =>
doSparkListenerResourceProfileAdded(app,
event.asInstanceOf[SparkListenerResourceProfileAdded])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can avoid asInstanceOf by using a variable in the pattern instead of _

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@tgravescs
Copy link
Collaborator Author

build

@tgravescs tgravescs merged commit c6cad4d into NVIDIA:branch-21.08 Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants