Memory leak in PyTorch interaction #26

lorenzoh · 2022-06-21T13:57:58Z

In trying to use PyTorch with PyCallChainRules.jl, I discovered a memory leak (rejuvyesh/PyCallChainRules.jl#24) that seems to be in its interaction with DLPack.jl.

I've tried to reduce the error to a MWE that leads to an Out-Of-Memory error because GPU memory isn't being freed correctly.

The following loads a model and a CuArray, uses DLPack.jl to share the CuArray with PyTorch and then runs the array through the model repeatedly. The GPU memory increases linearly until the OOM error.

using CUDA, PyCall, DLPack
dlpack = pyimport("torch.utils.dlpack")
torch = pyimport("torch")

pytorch_from_dlpack(x) = @pycall dlpack.from_dlpack(x)::PyObject

function memoryused()
    info = CUDA.MemoryInfo()
    return 1 - (info.free_bytes / info.total_bytes)
end

pymodel = torch.hub.load("pytorch/vision", "resnet18").to("cuda")

xs = cu(randn(Float32, 224, 224, 3, 16))
xs_shared = DLPack.share(xs, PyObject, pytorch_from_dlpack)
usages = [memoryused()]
numrefs = [length(DLPack.SHARES_POOL)]
for i in 1:100
    pymodel(xs_shared)
    push!(usages, memoryused())
    push!(numrefs, length(DLPack.SHARES_POOL))
end

The memory usage:

I also looked into DLPack.SHARES_POOL to see if there are references piling up there, but the number of references stays constant (2).

The GPU memory stays exhausted until I run torch.cuda.empty_cache(), but this only works after calling GC.gc(), leading me to believe that some references are being held on the Julia side that prevent the memory from being cleared by PyTorch.

Any help with this or pointers for places to look in the code base would be much appreciated! I'm happy to run further tests that could help diagnose the problem.

The text was updated successfully, but these errors were encountered:

findmyway · 2022-07-08T10:37:09Z

Any progress on this?

lorenzoh · 2022-07-13T10:41:27Z

Haven't had any luck finding the issue

pabloferz · 2022-07-14T16:20:22Z

Sorry I had missed this. I will try to look into it soon.

pabloferz · 2022-07-15T20:17:05Z

Would you be able to provide more details on the python packages and versions needed to reproduce this?

rejuvyesh · 2022-07-15T20:18:58Z

@pabloferz I think only pytorch is needed for this MWE. I have seen this on all recent torch versions 1.10, 1.11 and 1.12.

pabloferz · 2022-07-16T00:09:42Z

Ok, I verified and everything seems to work as it should. What you are seeing is the same as in JuliaPy/PyCall.jl#436 and JuliaPy/PyCall.jl#529. That is, there's no memory leak but Julia has no idea that it should garbage collect more frequently the PyObjects that result from the calls to pymodel.

What I would do here is the following:

using CUDA, PyCall, DLPack
dlpack = pyimport("torch.utils.dlpack")
torch = pyimport("torch")

pytorch_from_dlpack(x) = @pycall dlpack.from_dlpack(x)::PyObject

function memoryused()
    info = CUDA.MemoryInfo()
    return 1 - (info.free_bytes / info.total_bytes)
end

pymodel = torch.hub.load("pytorch/vision", "resnet18").to("cuda")
apply_model = x -> (y = pymodel(x); GC.gc(false); y)  # garbage collect, but only recent "young" objects

xs = cu(randn(Float32, 224, 224, 3, 16))
xs_shared = DLPack.share(xs, PyObject, pytorch_from_dlpack)
usages = [memoryused()]

for i in 1:100
    apply_model(xs_shared)
    push!(usages, memoryused())
end

# Alternatively:
for i in 1:100
    pymodel(xs_shared)
    GC.gc(false)
    push!(usages, memoryused())
end

rejuvyesh · 2022-07-18T18:17:39Z

Confirmed that this is true for PythonCall as well. Would be useful to document this prominently somewhere!

pabloferz mentioned this issue Jul 16, 2022

Error differentiating ResNet from torchvision rejuvyesh/PyCallChainRules.jl#24

Open

pabloferz mentioned this issue Jan 29, 2024

Add warning on GC to the README #38

Merged

pabloferz closed this as completed in #38 Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in PyTorch interaction #26

Memory leak in PyTorch interaction #26

lorenzoh commented Jun 21, 2022

findmyway commented Jul 8, 2022

lorenzoh commented Jul 13, 2022

pabloferz commented Jul 14, 2022

pabloferz commented Jul 15, 2022 •

edited

Loading

rejuvyesh commented Jul 15, 2022

pabloferz commented Jul 16, 2022 •

edited

Loading

rejuvyesh commented Jul 18, 2022

Memory leak in PyTorch interaction #26

Memory leak in PyTorch interaction #26

Comments

lorenzoh commented Jun 21, 2022

findmyway commented Jul 8, 2022

lorenzoh commented Jul 13, 2022

pabloferz commented Jul 14, 2022

pabloferz commented Jul 15, 2022 • edited Loading

rejuvyesh commented Jul 15, 2022

pabloferz commented Jul 16, 2022 • edited Loading

rejuvyesh commented Jul 18, 2022

pabloferz commented Jul 15, 2022 •

edited

Loading

pabloferz commented Jul 16, 2022 •

edited

Loading