-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device max memory watermark tracking #2392
Conversation
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
build |
1 similar comment
build |
@@ -1360,6 +1375,11 @@ class spark_resource_adaptor final : public rmm::mr::device_memory_resource { | |||
} | |||
transition(thread->second, thread_state::THREAD_RUNNING); | |||
thread->second.is_cpu_alloc = false; | |||
// num_bytes is likely not padded, which could cause slight inaccuracies | |||
// but for now it shouldn't matter for watermark purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that we document this in the metrics page when it is fully ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which page are you referring to exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it is an external page https://docs.nvidia.com/spark-rapids/user-guide/latest/tuning-guide.html#metrics
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
build |
build |
build |
Unblocks NVIDIA/spark-rapids#11457
Adds tracking of the max memory allocated on device over the lifespan of each task, and exposes this metric so that the Spark RAPIDS plugin can hook it up to a Spark accumulator.