From c7c5e56a69539d5ba1b2fa45a7e94f554f6c10e6 Mon Sep 17 00:00:00 2001 From: Hao Zhu Date: Fri, 4 Jun 2021 21:23:54 -0700 Subject: [PATCH 1/2] Signed-off-by: Hao Zhu Add the doc for -g option of the profiling tool. --- rapids-4-spark-tools/README.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/rapids-4-spark-tools/README.md b/rapids-4-spark-tools/README.md index c2078c08e8d..c158e0c1a56 100644 --- a/rapids-4-spark-tools/README.md +++ b/rapids-4-spark-tools/README.md @@ -50,8 +50,8 @@ Below is an example input: If any input is a S3 file path or directory path, here 2 extra steps to access S3 in Spark: 1. Download the matched jars based on the Hadoop version: - - hadoop-aws-.jar - - aws-java-sdk-.jar + - `hadoop-aws-.jar` + - `aws-java-sdk-.jar` Take Hadoop 2.7.4 for example, we can download and include below jars in the '--jars' option to spark-shell or spark-submit: [hadoop-aws-2.7.4.jar](https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-aws/2.7.4/hadoop-aws-2.7.4.jar) and @@ -75,7 +75,7 @@ Take Hadoop 2.7.4 for example, we can download and include below jars in the '-- ## Qualification Tool ### Use from spark-shell -1. Include rapids-4-spark-tools_2.12-.jar in the '--jars' option to spark-shell or spark-submit +1. Include `rapids-4-spark-tools_2.12-.jar` in the '--jars' option to spark-shell or spark-submit 2. After starting spark-shell: For multiple event logs comparison and analysis: @@ -231,6 +231,7 @@ Run `--help` for more information. - Print Rapids related parameters - Print Rapids Accelerator Jar and cuDF Jar - Print SQL Plan Metrics +- Generate Dot graph for each SQL For example, GPU run vs CPU run performance comparison or different runs with different parameters. @@ -304,6 +305,19 @@ SQL Plan Metrics for Application: |0 |1 |GpuColumnarExchange |116 |shuffle write time |666666666666 |nsTiming | ``` + +- Generate Dot graph for each SQL (-g option) +``` +Generated DOT graphs for app app-20210507103057-0000 to /path/. in 17 second(s) +``` +Once the dot file is generated, you can install [graphviz](http://www.graphviz.org) to convert the dot file +as a graph in pdf format using below command: +```bash +dot -Tpdf ./app-20210507103057-0000-query-0/0.dot > app-20210507103057-0000.pdf +``` +The pdf file has the SQL plan graph with metrics. + + #### B. Analysis - Job + Stage level aggregated task metrics - SQL level aggregated task metrics From e17249ff57b3e398777fa68a07df79858855b3c2 Mon Sep 17 00:00:00 2001 From: Hao Zhu <9665750+viadea@users.noreply.github.com> Date: Fri, 4 Jun 2021 21:29:20 -0700 Subject: [PATCH 2/2] Update README.md --- rapids-4-spark-tools/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rapids-4-spark-tools/README.md b/rapids-4-spark-tools/README.md index c158e0c1a56..c934d40bcf8 100644 --- a/rapids-4-spark-tools/README.md +++ b/rapids-4-spark-tools/README.md @@ -231,7 +231,7 @@ Run `--help` for more information. - Print Rapids related parameters - Print Rapids Accelerator Jar and cuDF Jar - Print SQL Plan Metrics -- Generate Dot graph for each SQL +- Generate DOT graph for each SQL For example, GPU run vs CPU run performance comparison or different runs with different parameters. @@ -306,11 +306,11 @@ SQL Plan Metrics for Application: ``` -- Generate Dot graph for each SQL (-g option) +- Generate DOT graph for each SQL (-g option) ``` Generated DOT graphs for app app-20210507103057-0000 to /path/. in 17 second(s) ``` -Once the dot file is generated, you can install [graphviz](http://www.graphviz.org) to convert the dot file +Once the DOT file is generated, you can install [graphviz](http://www.graphviz.org) to convert the DOT file as a graph in pdf format using below command: ```bash dot -Tpdf ./app-20210507103057-0000-query-0/0.dot > app-20210507103057-0000.pdf