Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix max bytes dealloc bug #2541

Merged
merged 1 commit into from
Oct 29, 2024
Merged

fix max bytes dealloc bug #2541

merged 1 commit into from
Oct 29, 2024

Conversation

zpuller
Copy link
Collaborator

@zpuller zpuller commented Oct 28, 2024

This is a follow up to #2392 to fix how we handle the deallocation path.

I verified that the deallocate behavior looks correct now, that is, if a job spills and reallocs, the metric never reports a watermark above the memory limit. The unit test change also verifies this case.

@@ -1802,7 +1805,6 @@ class spark_resource_adaptor final : public rmm::mr::device_memory_resource {
if (is_for_cpu == t_state.is_cpu_alloc) {
transition(t_state, thread_state::THREAD_ALLOC_FREE);
}
if (!is_for_cpu) { t_state.gpu_memory_allocated_bytes -= num_bytes; }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice in the original change that this code path is only running on other threads which is incorrect for the tracking behavior (see line 1787)

Signed-off-by: Zach Puller <zpuller@nvidia.com>
@zpuller
Copy link
Collaborator Author

zpuller commented Oct 28, 2024

build

@revans2 revans2 merged commit 02a5b34 into NVIDIA:branch-24.12 Oct 29, 2024
3 checks passed
@sameerz sameerz added the bug Something isn't working label Nov 5, 2024
YanxuanLiu added a commit to YanxuanLiu/spark-rapids-jni that referenced this pull request Dec 10, 2024
These files are modified in NVIDIA#2541 and NVIDIA#2562 but did not update year of copyright

Signed-off-by: Yanxuan Liu <yanxuanl@nvidia.com>
YanxuanLiu added a commit that referenced this pull request Dec 10, 2024
These files are modified in #2541 and #2562 but did not update year of copyright

Signed-off-by: Yanxuan Liu <yanxuanl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants