Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec #2823

andygrove · 2021-06-25T21:48:56Z

Signed-off-by: Andy Grove andygrove@nvidia.com

This PR changes the metrics reported for GpuRowToColumnarExec to include semaphoreWaitTime and gpuOpTime (previously reported as totalTime) as well as totalTime.

Before

After

There are two code paths for row-to-column.

GeneratedUnsafeRowToCudfRowIterator

GpuRowToColumnConverter

Signed-off-by: Andy Grove <andygrove@nvidia.com>

jlowe

I was going to say we need to update the tuning guide to discuss these new metrics, but #2720 wasn't merged?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala

Signed-off-by: Andy Grove <andygrove@nvidia.com>

revans2

Overall it looks good. Just a question and a small nit.

revans2 · 2021-06-28T13:46:01Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala

@@ -806,6 +817,8 @@ case class GpuRowToColumnarExec(child: SparkPlan, goal: CoalesceSizeGoal)
  }

  override lazy val additionalMetrics: Map[String, GpuMetric] = Map(
+    SEMAPHORE_WAIT_TIME -> createNanoTimingMetric(DEBUG_LEVEL, DESCRIPTION_SEMAPHORE_WAIT_TIME),
+    GPU_OP_TIME -> createNanoTimingMetric(MODERATE_LEVEL, DESCRIPTION_GPU_OP_TIME),
    TOTAL_TIME -> createNanoTimingMetric(MODERATE_LEVEL, DESCRIPTION_TOTAL_TIME),


I am tempted to drop TOTAL_TIME to DEBUG_LEVEL, but I don't know how you want to use it with benchmarks/etc so I don't know if that is a good idea or not.

What I would really like is OP_TIME. I should be able to calculate that as TOTAL_TIME - SEMAPHORE_WAIT_TIME - GPU_OPTIME. I'll look at that next.

Wouldn't that just be measuring mostly the time spent in earlier child execs within the stage? TOTAL_TIME includes time spent fetching inputs from child iterators. It wouldn't be very op-specific, and thus OP_TIME would be an odd name for it.

Yes, that's a good point. I would need to subtract the cost of the fetches too, but that might be expensive because this is a row-based iterator. I'll take a look and see what the options are.

This is converting rows on the CPU to columns on the GPU. There is close to no processing on the CPU beyond fetching data from upstream and putting it into a buffer. If we try to measure the amount of time it takes to convert from UnsafeRow to CudfUnsafeRow, or to just put it into the arrow format in a buffer (depending on the code path we take), we are likely going to spend more time measuring than actually doing the conversion. Unless knowing that number is critically important we would propose that we just lump it all together.

Ok, I went ahead and changed the level of TOTAL_TIME to DEBUG_LEVEL.

revans2 · 2021-06-28T13:48:05Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala

-    val nvtxRange = new NvtxRange("Acquire GPU", NvtxColor.RED)
+  def acquireIfNecessary(context: TaskContext, waitMetric: Option[GpuMetric] = None): Unit = {
+    val nvtxRange = waitMetric match {
+      case Some(m) => new NvtxWithMetrics("Acquire GPU", NvtxColor.RED, m)


We have done this in a few places. Would it be better to just have NvtxWithMetrics have a constructor that takes a Option[GpuMetric] and hide it internally?

revans2 · 2021-06-28T13:56:34Z

On a side not I noticed that data bricks photon has a "cumulative time" metric that fills the same role we have been using for "total time" but feels a bit more accurate in the naming. Just curious if it is worth refactoring/renaming to match?

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2021-06-28T15:24:35Z

On a side not I noticed that data bricks photon has a "cumulative time" metric that fills the same role we have been using for "total time" but feels a bit more accurate in the naming. Just curious if it is worth refactoring/renaming to match?

I like the cumulative name better. I'll create a separate PR to propose renaming this metric.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2021-06-28T16:45:54Z

build

sql-plugin/src/main/java/com/nvidia/spark/rapids/UnsafeRowToColumnarBatchIterator.java

sql-plugin/src/main/scala/com/nvidia/spark/rapids/NvtxWithMetrics.scala

Signed-off-by: Andy Grove <andygrove@nvidia.com>

jlowe · 2021-06-28T18:06:46Z

build

andygrove · 2021-06-28T18:06:56Z

build

andygrove added 2 commits June 25, 2021 15:16

Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec

2af1918

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Rename variable

27a0dfa

andygrove marked this pull request as ready for review June 25, 2021 21:54

jlowe reviewed Jun 25, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala Outdated Show resolved Hide resolved

Declare constants for SEMAPHORE_WAIT_TIME

2f3b108

Signed-off-by: Andy Grove <andygrove@nvidia.com>

sameerz added this to the June 21 - July 2 milestone Jun 28, 2021

sameerz added the bug Something isn't working label Jun 28, 2021

revans2 previously approved these changes Jun 28, 2021

View reviewed changes

andygrove added 2 commits June 28, 2021 09:17

Merge branch 'branch-21.08' into improve-row-to-col-metrics

804cbb8

Add gpuOpTime and semaphoreWaitTime to tuning guide

f265631

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove dismissed revans2’s stale review via f265631 June 28, 2021 15:19

andygrove added 2 commits June 28, 2021 10:38

Add NvtxWithMetrics factory method that takes optional metric

df35c20

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Change TOTAL_TIME level to DEBUG_LEVEL

4ad7dd7

Signed-off-by: Andy Grove <andygrove@nvidia.com>

revans2 previously approved these changes Jun 28, 2021

View reviewed changes

jlowe reviewed Jun 28, 2021

View reviewed changes

sql-plugin/src/main/java/com/nvidia/spark/rapids/UnsafeRowToColumnarBatchIterator.java Show resolved Hide resolved

sql-plugin/src/main/scala/com/nvidia/spark/rapids/NvtxWithMetrics.scala Outdated Show resolved Hide resolved

Remove default argument from NvtxWithMetrics.apply

92ee6e0

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove dismissed revans2’s stale review via 92ee6e0 June 28, 2021 17:41

jlowe approved these changes Jun 28, 2021

View reviewed changes

revans2 approved these changes Jun 28, 2021

View reviewed changes

andygrove merged commit e04e34e into NVIDIA:branch-21.08 Jun 28, 2021

andygrove deleted the improve-row-to-col-metrics branch June 28, 2021 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec #2823

Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec #2823

andygrove commented Jun 25, 2021

jlowe left a comment

revans2 left a comment

revans2 Jun 28, 2021

andygrove Jun 28, 2021

jlowe Jun 28, 2021

andygrove Jun 28, 2021

revans2 Jun 28, 2021

andygrove Jun 28, 2021

revans2 Jun 28, 2021

revans2 commented Jun 28, 2021

andygrove commented Jun 28, 2021

andygrove commented Jun 28, 2021

jlowe commented Jun 28, 2021

andygrove commented Jun 28, 2021

Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec #2823

Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec #2823

Conversation

andygrove commented Jun 25, 2021

Before

After

GeneratedUnsafeRowToCudfRowIterator

GpuRowToColumnConverter

jlowe left a comment

Choose a reason for hiding this comment

revans2 left a comment

Choose a reason for hiding this comment

revans2 Jun 28, 2021

Choose a reason for hiding this comment

andygrove Jun 28, 2021

Choose a reason for hiding this comment

jlowe Jun 28, 2021

Choose a reason for hiding this comment

andygrove Jun 28, 2021

Choose a reason for hiding this comment

revans2 Jun 28, 2021

Choose a reason for hiding this comment

andygrove Jun 28, 2021

Choose a reason for hiding this comment

revans2 Jun 28, 2021

Choose a reason for hiding this comment

revans2 commented Jun 28, 2021

andygrove commented Jun 28, 2021

andygrove commented Jun 28, 2021

jlowe commented Jun 28, 2021

andygrove commented Jun 28, 2021