-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec #2823
Merged
andygrove
merged 8 commits into
NVIDIA:branch-21.08
from
andygrove:improve-row-to-col-metrics
Jun 28, 2021
+67
−23
Merged
Changes from 3 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
2af1918
Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec
andygrove 27a0dfa
Rename variable
andygrove 2f3b108
Declare constants for SEMAPHORE_WAIT_TIME
andygrove 804cbb8
Merge branch 'branch-21.08' into improve-row-to-col-metrics
andygrove f265631
Add gpuOpTime and semaphoreWaitTime to tuning guide
andygrove df35c20
Add NvtxWithMetrics factory method that takes optional metric
andygrove 4ad7dd7
Change TOTAL_TIME level to DEBUG_LEVEL
andygrove 92ee6e0
Remove default argument from NvtxWithMetrics.apply
andygrove File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,7 +73,20 @@ object GpuSemaphore { | |
*/ | ||
def acquireIfNecessary(context: TaskContext): Unit = { | ||
if (enabled && context != null) { | ||
getInstance.acquireIfNecessary(context) | ||
getInstance.acquireIfNecessary(context, None) | ||
} | ||
} | ||
|
||
/** | ||
* Tasks must call this when they begin to use the GPU. | ||
* If the task has not already acquired the GPU semaphore then it is acquired, | ||
* blocking if necessary. | ||
* NOTE: A task completion listener will automatically be installed to ensure | ||
* the semaphore is always released by the time the task completes. | ||
*/ | ||
def acquireIfNecessary(context: TaskContext, waitMetric: GpuMetric): Unit = { | ||
if (enabled && context != null) { | ||
getInstance.acquireIfNecessary(context, Some(waitMetric)) | ||
} | ||
} | ||
|
||
|
@@ -103,8 +116,11 @@ private final class GpuSemaphore(tasksPerGpu: Int) extends Logging { | |
// Map to track which tasks have acquired the semaphore. | ||
private val activeTasks = new ConcurrentHashMap[Long, MutableInt] | ||
|
||
def acquireIfNecessary(context: TaskContext): Unit = { | ||
val nvtxRange = new NvtxRange("Acquire GPU", NvtxColor.RED) | ||
def acquireIfNecessary(context: TaskContext, waitMetric: Option[GpuMetric] = None): Unit = { | ||
val nvtxRange = waitMetric match { | ||
case Some(m) => new NvtxWithMetrics("Acquire GPU", NvtxColor.RED, m) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have done this in a few places. Would it be better to just have |
||
case _ => new NvtxRange("Acquire GPU", NvtxColor.RED) | ||
} | ||
try { | ||
val taskAttemptId = context.taskAttemptId() | ||
val refs = activeTasks.get(taskAttemptId) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am tempted to drop
TOTAL_TIME
toDEBUG_LEVEL
, but I don't know how you want to use it with benchmarks/etc so I don't know if that is a good idea or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would really like is OP_TIME. I should be able to calculate that as TOTAL_TIME - SEMAPHORE_WAIT_TIME - GPU_OPTIME. I'll look at that next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that just be measuring mostly the time spent in earlier child execs within the stage? TOTAL_TIME includes time spent fetching inputs from child iterators. It wouldn't be very op-specific, and thus OP_TIME would be an odd name for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good point. I would need to subtract the cost of the fetches too, but that might be expensive because this is a row-based iterator. I'll take a look and see what the options are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is converting rows on the CPU to columns on the GPU. There is close to no processing on the CPU beyond fetching data from upstream and putting it into a buffer. If we try to measure the amount of time it takes to convert from UnsafeRow to CudfUnsafeRow, or to just put it into the arrow format in a buffer (depending on the code path we take), we are likely going to spend more time measuring than actually doing the conversion. Unless knowing that number is critically important we would propose that we just lump it all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I went ahead and changed the level of TOTAL_TIME to DEBUG_LEVEL.