Rewrite Qualification tool for better performance #2822

tgravescs · 2021-06-25T19:53:50Z

This PR rewrites parts of the qualification tool. I kept the overall structure very similar so that code could be reused with the Profiling tool.

overall this is what this does:

removes requirement to run inside of Spark (ie no more Spark SQL Dataframe operations for aggregations), now you can just run via java. It still does require the SPARK jars to be present though.
Changes output to output both a text summary report with only 4 columns and a CSV file with the raw data that has more info
reorganizes the parsing code to aggregate things as we read the events whenever possible.
New QualAppInfo class that is specific to Qualification, removed pieces from ApplicationInfo that were Qualification specific.
Only keep the summary of each app in memory, individual app data is thrown away so we don't require a lot of memory
refactors event processor and ApplicationInfo to have base classes that could be shared between profiling and qualification tools
added in another columns for sql ids that had failed jobs in them.
added field SQL Duration with Potential Problems that is the total duration for any SQL queries that we found problematic - ie UDFS right now

This takes significantly less time and memory now to process event logs. The time it takes is really close to just the time it takes Spark to read and parse the events.

Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs · 2021-06-25T19:54:32Z

build

tgravescs · 2021-06-25T19:56:14Z

build

revans2

I am not an expert on the Qualifying tool but the changes look good to me. The only thing that might be interesting to explore is putting in a thread pool, like the Spark history server uses. That way if there are more than one application to look at for the qualification tool, it can all be done in parallel (up to a set thread pool size).

tgravescs · 2021-06-25T21:40:03Z

test failure because not upmerged

tgravescs · 2021-06-25T21:43:46Z

thanks Bobby, definitely a good idea. I'll do that in a followup

tgravescs · 2021-06-25T21:45:03Z

build

tgravescs · 2021-06-28T14:52:01Z

build

tgravescs · 2021-06-28T18:35:10Z

@revans2 are you ok with this being merged (if so can you approve)?

tgravescs · 2021-06-28T20:22:25Z

build

gerashegalov · 2021-06-28T20:34:37Z

tools/pom.xml

@@ -41,6 +41,7 @@
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
+            <scope>compile,runtime</scope>


I think the compile scope is a superset of the runtime scope. As written, this is equivalent to just compile

gerashegalov · 2021-06-28T20:41:55Z

tools/src/main/scala/org/apache/spark/sql/rapids/tool/EventProcessorBase.scala

+        event match {
+          case _: SparkListenerResourceProfileAdded =>
+            doSparkListenerResourceProfileAdded(app,
+              event.asInstanceOf[SparkListenerResourceProfileAdded])


nit: can avoid asInstanceOf by using a variable in the pattern instead of _

tgravescs · 2021-06-28T21:20:26Z

build

tgravescs and others added 30 commits June 23, 2021 15:55

checkpoint qual redesign

d75d162

checkpoint

0fd5958

fix

cecf3d6

Signed-off-by: Thomas Graves <tgraves@apache.org>

fix longs

8ff5d60

fixes

8e41e72

update qualification suite

48c44a4

fix main

997e7d2

fixes

4897b0a

add newlines

54e53fc

fixes

3fa509b

debug sql duration

d7a8053

fix duration

0f22c06

use empty quotes

9ace7c7

fix sort

ff1454e

debug

bcd5f72

float to double

aabd455

same schema

b88ae88

fixes

9ef3bb5

fix division by 0

4b1f8ac

debug test

7317fc9

handle failures

fe42139

update output header

33363b9

update result

43303f0

handle failed for cpu time

e2435b1

track failed jobs

29fb032

track unfinished sqls

7a32758

don't count cpu time for unfinished

03a4a75

update tests

f9e88e4

fixing scope

ead0dac

write report

0a37d16

tgravescs requested review from GaryShen2008 and jlowe as code owners June 25, 2021 19:53

tgravescs requested review from NvTimLiu and revans2 as code owners June 25, 2021 19:53

update app name

e4fa1f2

revans2 reviewed Jun 25, 2021

View reviewed changes

Merge remote-tracking branch 'origin/branch-21.08' into qualRedesign

fe32c39

resolve upmerge

b17fc00

Merge remote-tracking branch 'origin/branch-21.08' into qualRedesign

c9175e1

revans2 previously approved these changes Jun 28, 2021

View reviewed changes

Merge remote-tracking branch 'origin/branch-21.08' into qualRedesign

04d4553

tgravescs dismissed revans2’s stale review via 04d4553 June 28, 2021 20:22

gerashegalov previously approved these changes Jun 28, 2021

View reviewed changes

review comments

44cb42f

tgravescs dismissed gerashegalov’s stale review via 44cb42f June 28, 2021 21:19

gerashegalov approved these changes Jun 29, 2021

View reviewed changes

tgravescs merged commit c6cad4d into NVIDIA:branch-21.08 Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Qualification tool for better performance #2822

Rewrite Qualification tool for better performance #2822

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

revans2 left a comment

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

tgravescs commented Jun 28, 2021

tgravescs commented Jun 28, 2021

tgravescs commented Jun 28, 2021

gerashegalov Jun 28, 2021

tgravescs Jun 28, 2021

gerashegalov Jun 28, 2021

tgravescs Jun 28, 2021

tgravescs commented Jun 28, 2021

Rewrite Qualification tool for better performance #2822

Rewrite Qualification tool for better performance #2822

Conversation

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

revans2 left a comment

Choose a reason for hiding this comment

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

tgravescs commented Jun 25, 2021

tgravescs commented Jun 28, 2021

tgravescs commented Jun 28, 2021

tgravescs commented Jun 28, 2021

gerashegalov Jun 28, 2021

Choose a reason for hiding this comment

tgravescs Jun 28, 2021

Choose a reason for hiding this comment

gerashegalov Jun 28, 2021

Choose a reason for hiding this comment

tgravescs Jun 28, 2021

Choose a reason for hiding this comment

tgravescs commented Jun 28, 2021