HausdorffDTLoss leads to GPU memory leak. #7480

dimka11 · 2024-02-20T11:22:19Z

Describe the bug
Using this loss method with Trainer from transformers library (Pytorch) and YOLOv8 (Pytorch) leads to crash training shortly after start due to cuda out of memory.
16 gb gpu memory, batch size is 1 with 128*128 image. Training crash after ~ 100 iterations.

Environment

Kaggle Notebook, python 3.10.12, last monai version from pip.

Also reproduced this bug under Windows 11 with code from example:

%%time

import torch
import numpy as np
from monai.losses.hausdorff_loss import HausdorffDTLoss
from monai.networks.utils import one_hot

for i in range(0, 30):
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True ,reduction='none', softmax=True)
    loss = self(input.to('cuda'), target.to('cuda'))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape

It ate about 5 gb memory, on the GPU consumption graph it looks like a flat line with several rises.

The text was updated successfully, but these errors were encountered:

SarthakJShetty-path · 2024-02-22T22:49:53Z

Describe the bug Using this loss method with Trainer from transformers library (Pytorch) and YOLOv8 (Pytorch) leads to crash training shortly after start due to cuda out of memory. 16 gb gpu memory, batch size is 1 with 128*128 image. Training crash after ~ 100 iterations.

Environment

Kaggle Notebook, python 3.10.12, last monai version from pip.

Also reproduced this bug under Windows 11 with code from example:
%%time

import torch
import numpy as np
from monai.losses.hausdorff_loss import HausdorffDTLoss
from monai.networks.utils import one_hot

for i in range(0, 30):
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True ,reduction='none', softmax=True)
    loss = self(input.to('cuda'), target.to('cuda'))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
It ate about 5 gb memory, on the GPU consumption graph it looks like a flat line with several rises.

+1 to this. I'm also running into the same exact issue, where there appears to be a GPU memory leak in Monai's HausdorffTDLoss.

Things that I tried to alleviate this issue:

Used @torch.no_grad() in metric update function, since I'm using the loss as a metric
Used .detach() for the prediction and target tensors to make sure that they aren't part of the computational graph

KumoLiu · 2024-02-23T05:54:15Z

Thank you for bringing up this issue. I took the time to delve deeper into the situation using your provided sample code. From my findings, there doesn't appear to be a noticeable GPU memory leak.
Please refer to the GPU memory usage graph provided below:

Minor fluctuations present in the graph can be attributed to the initialization of the super loss function as seen here. Thank you again for your diligent reporting.
Best,

SarthakJShetty-path · 2024-02-23T13:50:09Z

Thank you for bringing up this issue. I took the time to delve deeper into the situation using your provided sample code. From my findings, there doesn't appear to be a noticeable GPU memory leak. Please refer to the GPU memory usage graph provided below: Minor fluctuations present in the graph can be attributed to the initialization of the super loss function as seen here. Thank you again for your diligent reporting. Best,

Thank you for your quick response! I appreciate you taking the time to get back to us.

I modified the code sample above slightly to plot the memory allocated by PyTorch during the process:

import numpy as np
import torch
import matplotlib.pyplot as plt
from monai.networks.utils import one_hot
from monai.losses.hausdorff_loss import HausdorffDTLoss

gpu_consumption = []
steps = []

for i in range(0, 100):
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True, reduction="none", softmax=True)
    loss = self(input.to("cuda"), target.to("cuda"))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
    memory_consumption = torch.cuda.max_memory_allocated(device=None) / (1e9)
    gpu_consumption.append(memory_consumption)
    steps.append(i)
    print(f"GPU max memory allocated: {memory_consumption} GB")

plt.plot(steps, gpu_consumption)
plt.title("GPU consumption (in GB) vs. Steps")
plt.show()

Which generated the following graph:

It appears that the memory consumption continuously increases with the number of steps, eventually leading to a crash because the system runs out of available GPU memory.

Please let me know if the initialization should be handled different. The person above instantiates a new loss in every loop, which might be linked to what you're talking about about super instantiation taking some additional GPU memory, but I've noticed this independently with my training runs as well.

Thanks!

KumoLiu · 2024-02-23T14:47:12Z

Hi @SarthakJShetty-path, I used the same code you shared, the graph looks like:

What's your PyTorch and MONAI version?

SarthakJShetty-path · 2024-02-23T16:58:54Z

Hi @SarthakJShetty-path, I used the same code you shared, the graph looks like:

What's your PyTorch and MONAI version?

Interesting!

Here are the versions:

(venv) ┌─[pc@home] - [~/projects/]
└─[$] <git:(feature_branch*)> python 
Python 3.8.10 (default, Nov 22 2023, 10:22:35) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import monai
>>> torch.__version__
'2.1.2+cu121'
>>> monai.__version__
'1.3.0'
>>>

KumoLiu · 2024-02-26T06:19:37Z

I've conducted tests using a new 1.3.0 image and unfortunately, I've been unable to reproduce your reported issue. Could I kindly recommend attempting the same process in a fresh environment on your end?
To aid you, please use the Docker command below to pull the appropriate MONAI image:
docker pull projectmonai/monai:1.3.0

SarthakJShetty-path · 2024-02-26T15:03:03Z

I've conducted tests using a new 1.3.0 image and unfortunately, I've been unable to reproduce your reported issue. Could I kindly recommend attempting the same process in a fresh environment on your end? To aid you, please use the Docker command below to pull the appropriate MONAI image: docker pull projectmonai/monai:1.3.0

Sure @KumoLiu I can make a fresh build and recheck this issue in a bit. Just to confirm: You weren't able to replicate the issue even with the torch 2.1.2+cu121 version, or just with monai 1.3.0?

Thank you!

KumoLiu · 2024-02-26T15:19:51Z

Just to confirm: You weren't able to replicate the issue even with the torch 2.1.2+cu121 version

Yes, with the torch 2.1.2+cu121 version.

dimka11 · 2024-03-11T07:33:32Z

@SarthakJShetty-path any updates?

SarthakJShetty-path · 2024-03-11T12:51:11Z

@SarthakJShetty-path any updates?

Sorry about the delay with this. I haven't gotten around to pulling the Docker image and trying, but several members on our team are reporting this issue, with approximately the same CUDA + Torch version. I will pull the Docker image by EoD today and get back to you. Thank you, and sorry for the delay.

dimka11 · 2024-03-14T06:47:27Z

@SarthakJShetty-path I did switch to #4205 ShapeLoss, but don't sure that it brings the expected results.

SarthakJShetty-path · 2024-03-14T14:03:42Z

@SarthakJShetty-path I did switch to #4205 ShapeLoss, but don't sure that it brings the expected results.

Can you try running this piece of code and posting the results?

import numpy as np
import torch
import matplotlib.pyplot as plt
from monai.networks.utils import one_hot
from monai.losses.hausdorff_loss import HausdorffDTLoss

gpu_consumption = []
steps = []

for i in range(0, 100):
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True, reduction="none", softmax=True)
    loss = self(input.to("cuda"), target.to("cuda"))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
    memory_consumption = torch.cuda.max_memory_allocated(device=None) / (1e9)
    gpu_consumption.append(memory_consumption)
    steps.append(i)
    print(f"GPU max memory allocated: {memory_consumption} GB")

plt.plot(steps, gpu_consumption)
plt.title("GPU consumption (in GB) vs. Steps")
plt.show()

It looks like @KumoLiu got a much different graph from what I received.

dimka11 · 2024-03-14T17:10:21Z

@SarthakJShetty-path

`
Kaggle Notebook:
Python 3.10.13
2.1.2
1.3.0

GPU max memory allocated: 0.575145984 GB
GPU max memory allocated: 0.856951808 GB
GPU max memory allocated: 1.058018304 GB
GPU max memory allocated: 1.546395648 GB
GPU max memory allocated: 2.002266112 GB
GPU max memory allocated: 2.002266112 GB
GPU max memory allocated: 2.002266112 GB
GPU max memory allocated: 2.404661248 GB
GPU max memory allocated: 2.731032064 GB
GPU max memory allocated: 2.731032064 GB
GPU max memory allocated: 2.731032064 GB
GPU max memory allocated: 2.775597056 GB
GPU max memory allocated: 2.775597056 GB
GPU max memory allocated: 2.775597056 GB
GPU max memory allocated: 3.17851648 GB
GPU max memory allocated: 3.546306048 GB
GPU max memory allocated: 3.546306048 GB
GPU max memory allocated: 3.546306048 GB
GPU max memory allocated: 3.587463168 GB
GPU max memory allocated: 3.587463168 GB
GPU max memory allocated: 3.587463168 GB
GPU max memory allocated: 3.990120448 GB
GPU max memory allocated: 4.409028096 GB
GPU max memory allocated: 4.409028096 GB
GPU max memory allocated: 4.409028096 GB
GPU max memory allocated: 4.4410112 GB
GPU max memory allocated: 4.718360576 GB
GPU max memory allocated: 4.718360576 GB
GPU max memory allocated: 4.718360576 GB
GPU max memory allocated: 4.81351936 GB
GPU max memory allocated: 4.81351936 GB
GPU max memory allocated: 4.81351936 GB
GPU max memory allocated: 5.252877312 GB
GPU max memory allocated: 5.533896704 GB
GPU max memory allocated: 5.533896704 GB
GPU max memory allocated: 5.533896704 GB
GPU max memory allocated: 5.67099904 GB
GPU max memory allocated: 5.67099904 GB
GPU max memory allocated: 5.67099904 GB
GPU max memory allocated: 6.073394176 GB
GPU max memory allocated: 6.399764992 GB
GPU max memory allocated: 6.399764992 GB
GPU max memory allocated: 6.399764992 GB
GPU max memory allocated: 6.444329984 GB
GPU max memory allocated: 6.444329984 GB
GPU max memory allocated: 6.444329984 GB
GPU max memory allocated: 6.847249408 GB
GPU max memory allocated: 7.215038976 GB
GPU max memory allocated: 7.215038976 GB
GPU max memory allocated: 7.215038976 GB
GPU max memory allocated: 7.256196096 GB
GPU max memory allocated: 7.256196096 GB
GPU max memory allocated: 7.256196096 GB
GPU max memory allocated: 7.658853376 GB
GPU max memory allocated: 8.077761024 GB
GPU max memory allocated: 8.077761024 GB
GPU max memory allocated: 8.077761024 GB
GPU max memory allocated: 8.109744128 GB
GPU max memory allocated: 8.387093504 GB
GPU max memory allocated: 8.387093504 GB
GPU max memory allocated: 8.387093504 GB
GPU max memory allocated: 8.482252288 GB
GPU max memory allocated: 8.482252288 GB
GPU max memory allocated: 8.482252288 GB
GPU max memory allocated: 8.92161024 GB
GPU max memory allocated: 9.202629632 GB
GPU max memory allocated: 9.202629632 GB
GPU max memory allocated: 9.202629632 GB
GPU max memory allocated: 9.339731968 GB
GPU max memory allocated: 9.339731968 GB
GPU max memory allocated: 9.339731968 GB
GPU max memory allocated: 9.742127104 GB
GPU max memory allocated: 10.06849792 GB
GPU max memory allocated: 10.06849792 GB
GPU max memory allocated: 10.06849792 GB
GPU max memory allocated: 10.113062912 GB
GPU max memory allocated: 10.113062912 GB
GPU max memory allocated: 10.113062912 GB

Google Colab notebook:
Python 3.10.12
2.2.1+cu121
1.3.0

GPU max memory allocated: 0.58143744 GB
GPU max memory allocated: 0.954994176 GB
GPU max memory allocated: 1.023153152 GB
GPU max memory allocated: 1.54456064 GB
GPU max memory allocated: 2.002003968 GB
GPU max memory allocated: 2.002003968 GB
GPU max memory allocated: 2.002003968 GB
GPU max memory allocated: 2.401777664 GB
GPU max memory allocated: 2.679651328 GB
GPU max memory allocated: 2.679651328 GB
GPU max memory allocated: 2.679651328 GB
GPU max memory allocated: 2.767994368 GB
GPU max memory allocated: 2.767994368 GB
GPU max memory allocated: 2.76904448 GB
GPU max memory allocated: 3.179825664 GB
GPU max memory allocated: 3.179825664 GB
GPU max memory allocated: 3.179825664 GB
GPU max memory allocated: 3.625475072 GB
GPU max memory allocated: 3.903086592 GB
GPU max memory allocated: 3.903086592 GB
GPU max memory allocated: 3.903086592 GB
GPU max memory allocated: 3.991429632 GB
GPU max memory allocated: 3.991429632 GB
GPU max memory allocated: 3.9922176 GB
GPU max memory allocated: 4.403260928 GB
GPU max memory allocated: 4.403260928 GB
GPU max memory allocated: 4.403260928 GB
GPU max memory allocated: 4.812209152 GB
GPU max memory allocated: 4.812209152 GB
GPU max memory allocated: 4.812209152 GB
GPU max memory allocated: 5.251828736 GB
GPU max memory allocated: 5.532848128 GB
GPU max memory allocated: 5.532848128 GB
GPU max memory allocated: 5.532848128 GB
GPU max memory allocated: 5.58396672 GB
GPU max memory allocated: 5.58396672 GB
GPU max memory allocated: 5.621192704 GB
GPU max memory allocated: 6.032498176 GB
GPU max memory allocated: 6.032498176 GB
GPU max memory allocated: 6.032498176 GB
GPU max memory allocated: 6.47788544 GB
GPU max memory allocated: 6.755759104 GB
GPU max memory allocated: 6.755759104 GB
GPU max memory allocated: 6.755759104 GB
GPU max memory allocated: 6.84515072 GB
GPU max memory allocated: 6.84515072 GB
GPU max memory allocated: 6.84515072 GB
GPU max memory allocated: 7.258817536 GB
GPU max memory allocated: 7.258817536 GB
GPU max memory allocated: 7.258817536 GB
GPU max memory allocated: 7.698174976 GB
GPU max memory allocated: 7.978145792 GB
GPU max memory allocated: 7.978145792 GB
GPU max memory allocated: 7.978145792 GB
GPU max memory allocated: 8.06701312 GB
GPU max memory allocated: 8.06701312 GB
GPU max memory allocated: 8.06701312 GB
GPU max memory allocated: 8.523933696 GB
GPU max memory allocated: 8.523933696 GB
GPU max memory allocated: 8.523933696 GB
GPU max memory allocated: 8.924755968 GB
GPU max memory allocated: 9.201581056 GB
GPU max memory allocated: 9.201581056 GB
GPU max memory allocated: 9.201581056 GB
GPU max memory allocated: 9.25217536 GB
GPU max memory allocated: 9.25217536 GB
GPU max memory allocated: 9.289401344 GB
GPU max memory allocated: 9.701231104 GB
GPU max memory allocated: 9.701231104 GB
GPU max memory allocated: 9.701231104 GB
GPU max memory allocated: 10.187250688 GB
GPU max memory allocated: 10.187250688 GB
GPU max memory allocated: 10.187250688 GB
GPU max memory allocated: 10.187250688 GB
GPU max memory allocated: 10.475086336 GB
GPU max memory allocated: 10.475086336 GB
GPU max memory allocated: 10.475086336 GB
GPU max memory allocated: 10.516767232 GB
GPU max memory allocated: 10.516767232 GB
GPU max memory allocated: 10.516767232 GB
GPU max memory allocated: 10.92440576 GB

Local (Windows 11, RTX3060)

2.1.1+cu121
1.3.0
3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
GPU max memory allocated: 0.542377984 GB
GPU max memory allocated: 0.990908416 GB
GPU max memory allocated: 0.990908416 GB
GPU max memory allocated: 1.466966016 GB
GPU max memory allocated: 1.95534336 GB
GPU max memory allocated: 2.323132928 GB
GPU max memory allocated: 2.323132928 GB
GPU max memory allocated: 2.323132928 GB
GPU max memory allocated: 2.364552192 GB
GPU max memory allocated: 2.364552192 GB
GPU max memory allocated: 2.364552192 GB
GPU max memory allocated: 2.767209472 GB
GPU max memory allocated: 3.18611712 GB
GPU max memory allocated: 3.18611712 GB
GPU max memory allocated: 3.18611712 GB
GPU max memory allocated: 3.218100224 GB
GPU max memory allocated: 3.495187456 GB
GPU max memory allocated: 3.495187456 GB
GPU max memory allocated: 3.495187456 GB
GPU max memory allocated: 3.591132672 GB
GPU max memory allocated: 3.591132672 GB
GPU max memory allocated: 3.591132672 GB
GPU max memory allocated: 4.029966336 GB
GPU max memory allocated: 4.310723584 GB
GPU max memory allocated: 4.310723584 GB
GPU max memory allocated: 4.310723584 GB
GPU max memory allocated: 4.40273664 GB
GPU max memory allocated: 4.40273664 GB
GPU max memory allocated: 4.40273664 GB
GPU max memory allocated: 4.850745344 GB
GPU max memory allocated: 5.176067584 GB
GPU max memory allocated: 5.176067584 GB
GPU max memory allocated: 5.176067584 GB
GPU max memory allocated: 5.222467584 GB
GPU max memory allocated: 5.222467584 GB
GPU max memory allocated: 5.222467584 GB
GPU max memory allocated: 5.624338432 GB
GPU max memory allocated: 5.991603712 GB
GPU max memory allocated: 5.991603712 GB
GPU max memory allocated: 5.991603712 GB
GPU max memory allocated: 5.99658496 GB
GPU max memory allocated: 5.99658496 GB
GPU max memory allocated: 5.99658496 GB
GPU max memory allocated: 6.398455808 GB
GPU max memory allocated: 6.768604672 GB
GPU max memory allocated: 6.768604672 GB
GPU max memory allocated: 6.768604672 GB
GPU max memory allocated: 6.887881728 GB
GPU max memory allocated: 7.16365824 GB
GPU max memory allocated: 7.16365824 GB
GPU max memory allocated: 7.16365824 GB
GPU max memory allocated: 7.260389888 GB
`

SarthakJShetty-path · 2024-03-14T17:14:48Z

@KumoLiu Looks like even @dimka11 has the same error. The GPU memory seems to strictly increase even on Google Colab, with values very similar to what I posted above.

johnzielke · 2024-03-20T19:37:45Z

Chiming in here, since I worked on #7008 trying to make the HausdorffLoss work with cucim.

@SarthakJShetty-path I did switch to #4205 ShapeLoss, but don't sure that it brings the expected results.

Can you try running this piece of code and posting the results?

import numpy as np
import torch
import matplotlib.pyplot as plt
from monai.networks.utils import one_hot
from monai.losses.hausdorff_loss import HausdorffDTLoss

gpu_consumption = []
steps = []

for i in range(0, 100):
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True, reduction="none", softmax=True)
    loss = self(input.to("cuda"), target.to("cuda"))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
    memory_consumption = torch.cuda.max_memory_allocated(device=None) / (1e9)
    gpu_consumption.append(memory_consumption)
    steps.append(i)
    print(f"GPU max memory allocated: {memory_consumption} GB")

plt.plot(steps, gpu_consumption)
plt.title("GPU consumption (in GB) vs. Steps")
plt.show()

It looks like @KumoLiu got a much different graph from what I received.

I just tested this myself, and I also get an increase in GPU memory usage with each step when running your script. (Windows 11, WSL2, Monai 1.3.0, Pytorch 2.2.1+cuda12.1, Python 3.11.8)
If I add a gc.collect() after the assert though, the memory usage stays constant, i.e. the script would be

import gc

import numpy as np
import torch
import matplotlib.pyplot as plt
from monai.networks.utils import one_hot
from monai.losses.hausdorff_loss import HausdorffDTLoss

gpu_consumption = []
steps = []

for i in range(0, 10):
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True, reduction="none", softmax=True)
    loss = self(input.to("cuda"), target.to("cuda"))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
    gc.collect()
    memory_consumption = torch.cuda.max_memory_allocated(device=None) / (1e9)
    gpu_consumption.append(memory_consumption)
    steps.append(i)
    print(f"GPU max memory allocated: {memory_consumption} GB")

plt.plot(steps, gpu_consumption)
plt.title("GPU consumption (in GB) vs. Steps")
plt.show()

This seems to indicate some problem with recognizing unused tensors. Maybe there is some issue with cupy/cucim interoperability and weakrefs created by that?
I tried to debug this issue using the pytorch profiler, unfortunately as soon as you enable stack traces, there seems to be a bug with pytorch / kineto profiling that writes bad json traces when enabling stack traces. For reference, this is what I used to test:

import gc

import numpy as np
import matplotlib.pyplot as plt
from monai.networks.utils import one_hot
from monai.losses.hausdorff_loss import HausdorffDTLoss
from torch.profiler import profile, ProfilerActivity
import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
gpu_consumption = []
steps = []
def calculate():
    B, C, H, W = 16, 5, 512, 512
    input = torch.rand(B, C, H, W)
    target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long()
    target = one_hot(target_idx[:, None, ...], num_classes=C)
    self = HausdorffDTLoss(include_background=True, reduction="none", softmax=True)
    loss = self(input.to("cuda"), target.to("cuda"))
    assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    # Enabling with stack creates bad traces
    # with_stack=True,
    profile_memory=True,
    record_shapes=True,
    # on_trace_ready=torch.profiler.tensorboard_trace_handler('./logs/memleak'),
) as prof:
    for i in range(0, 50):
        prof.step()
        calculate()
        # Adding this line fixes the memory leak
        # gc.collect()
        memory_consumption = torch.cuda.max_memory_allocated(device=None) / (1e9)
        gpu_consumption.append(memory_consumption)
        steps.append(i)
        print(f"GPU max memory allocated: {memory_consumption} GB")
prof.export_chrome_trace("trace.json")
plt.plot(steps, gpu_consumption)
plt.title("GPU consumption (in GB) vs. Steps")
plt.show()

When running it without stack_traces, the memory view looks like this without the gc.collect() call:

dimka11 · 2024-03-21T05:48:46Z

Thanks! I have tried with cucim on Kaggle Notebook (installed via !pip install cucim-cu12) and there is no gpu memory leak (without gc.collect()).
Of course it's not option for native Windows.

But HausdorffLoss already use distance_transform_edt that use cucim?

johnzielke · 2024-03-21T13:42:21Z

It does use cucim based on this logic, so both cucim and cupy have to be installed:

    distance_transform_edt, has_cucim = optional_import(
        "cucim.core.operations.morphology", name="distance_transform_edt"
    )
    use_cp = has_cp and has_cucim and isinstance(img, torch.Tensor) and img.device.type == "cuda"

Sorry I'm not quite following. So before that, you ran it without cucim installed?

johnzielke · 2024-03-21T13:53:23Z

Ok, so I just checked that myself, and it seems you are right. If you have both cucim and cupy installed there does not seem to be a memory leak anymore

johnzielke · 2024-03-21T14:26:59Z

@KumoLiu I was able to reproduce the issue using the projectmonai/monai:1.3.0 container.

If you run pip uninstall cupy-cuda12x before executing the script, the memory leak will occur.

On the other hand, if you hardcode monai/transforms/utils.py:2112 (use_cp) to False with cupy installed, the memory leak does not occur.

So it seems that cupy being installed changes something somewhere in the memory deallocation or garbage collection. Although I am not sure where this is.

SarthakJShetty-path · 2024-03-21T14:49:56Z

@KumoLiu I was able to reproduce the issue using the projectmonai/monai:1.3.0 container.

If you run pip uninstall cupy-cuda12x before executing the script, the memory leak will occur.

On the other hand, if you hardcode monai/transforms/utils.py:2112 (use_cp) to False with cupy installed, the memory leak does not occur.

So it seems that cupy being installed changes something somewhere in the memory deallocation or garbage collection. Although I am not sure where this is.

Thank you for looking into this @johnzielke! So this means that installing cupy should a "fix" for now?

johnzielke · 2024-03-21T14:54:30Z

I mean test it on your setup, but I guess so. You should also get a nice ~10x performance boost in the calculation of the loss with both cupy and cucim since it will run on the GPU then. There should probably be a warning or at least some more docs that explain how to get the calculation to the GPU

KumoLiu · 2024-03-22T03:16:32Z

Hi @johnzielke, thanks for the detailed report. Your findings are insightful and indeed point to an interaction between CuPy, garbage collection, and memory deallocation which could be the root cause of the memory leak issue. I agree, looking into this could lead both to resolving the memory leak and potentially offering a substantial performance boost by enabling the loss calculation to run on the GPU.
And I tried with the 1.3.0 container and the latest container, with cupy-cuda12x installed, it works well. The issue only happens when uninstall cupy-cuda12x. So the issue might be due to the interdependencies between CUDA and the CuPy library. Specifically, when CuPy is installed, it links with particular CUDA libraries to perform GPU computations. When uninstall CuPy, some CUDA operations may not be correctly performed because they need CuPy to access the GPU.
Currently I don't have time to take a deep look at this issue.

SarthakJShetty-path · 2024-03-26T13:56:02Z

Hi @johnzielke, thanks for the detailed report. Your findings are insightful and indeed point to an interaction between CuPy, garbage collection, and memory deallocation which could be the root cause of the memory leak issue. I agree, looking into this could lead both to resolving the memory leak and potentially offering a substantial performance boost by enabling the loss calculation to run on the GPU. And I tried with the 1.3.0 container and the latest container, with cupy-cuda12x installed, it works well. The issue only happens when uninstall cupy-cuda12x. So the issue might be due to the interdependencies between CUDA and the CuPy library. Specifically, when CuPy is installed, it links with particular CUDA libraries to perform GPU computations. When uninstall CuPy, some CUDA operations may not be correctly performed because they need CuPy to access the GPU. Currently I don't have time to take a deep look at this issue.

No worries @KumoLiu thank you (and @johnzielke) for taking a look at this nonetheless. I'll try installing CuPy and double checking that that avoids this GPU leak.

johnzielke mentioned this issue Apr 30, 2024

Memory leak in monai.transforms.distance_transform_edt #7727

Open

PengchengShi1220 mentioned this issue Jul 16, 2024

Possible memory leak? PengchengShi1220/cbDice#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HausdorffDTLoss leads to GPU memory leak. #7480

HausdorffDTLoss leads to GPU memory leak. #7480

dimka11 commented Feb 20, 2024 •

edited

Loading

SarthakJShetty-path commented Feb 22, 2024 •

edited

Loading

KumoLiu commented Feb 23, 2024

SarthakJShetty-path commented Feb 23, 2024

KumoLiu commented Feb 23, 2024

SarthakJShetty-path commented Feb 23, 2024

KumoLiu commented Feb 26, 2024 •

edited

Loading

SarthakJShetty-path commented Feb 26, 2024

KumoLiu commented Feb 26, 2024

dimka11 commented Mar 11, 2024

SarthakJShetty-path commented Mar 11, 2024

dimka11 commented Mar 14, 2024

SarthakJShetty-path commented Mar 14, 2024

dimka11 commented Mar 14, 2024

SarthakJShetty-path commented Mar 14, 2024

johnzielke commented Mar 20, 2024 •

edited

Loading

dimka11 commented Mar 21, 2024 •

edited

Loading

johnzielke commented Mar 21, 2024

johnzielke commented Mar 21, 2024 •

edited

Loading

johnzielke commented Mar 21, 2024

SarthakJShetty-path commented Mar 21, 2024

johnzielke commented Mar 21, 2024

KumoLiu commented Mar 22, 2024

SarthakJShetty-path commented Mar 26, 2024

HausdorffDTLoss leads to GPU memory leak. #7480

HausdorffDTLoss leads to GPU memory leak. #7480

Comments

dimka11 commented Feb 20, 2024 • edited Loading

SarthakJShetty-path commented Feb 22, 2024 • edited Loading

KumoLiu commented Feb 23, 2024

SarthakJShetty-path commented Feb 23, 2024

KumoLiu commented Feb 23, 2024

SarthakJShetty-path commented Feb 23, 2024

KumoLiu commented Feb 26, 2024 • edited Loading

SarthakJShetty-path commented Feb 26, 2024

KumoLiu commented Feb 26, 2024

dimka11 commented Mar 11, 2024

SarthakJShetty-path commented Mar 11, 2024

dimka11 commented Mar 14, 2024

SarthakJShetty-path commented Mar 14, 2024

dimka11 commented Mar 14, 2024

SarthakJShetty-path commented Mar 14, 2024

johnzielke commented Mar 20, 2024 • edited Loading

dimka11 commented Mar 21, 2024 • edited Loading

johnzielke commented Mar 21, 2024

johnzielke commented Mar 21, 2024 • edited Loading

johnzielke commented Mar 21, 2024

SarthakJShetty-path commented Mar 21, 2024

johnzielke commented Mar 21, 2024

KumoLiu commented Mar 22, 2024

SarthakJShetty-path commented Mar 26, 2024

dimka11 commented Feb 20, 2024 •

edited

Loading

SarthakJShetty-path commented Feb 22, 2024 •

edited

Loading

KumoLiu commented Feb 26, 2024 •

edited

Loading

johnzielke commented Mar 20, 2024 •

edited

Loading

dimka11 commented Mar 21, 2024 •

edited

Loading

johnzielke commented Mar 21, 2024 •

edited

Loading