optionally close orphaned NDArrays using Java garbage collection #2273

enpasos · 2022-12-31T07:08:12Z

Description

This PR is a solution suggestion to #2210.

A striking example of the impact shows the following figure (to reproduce see #2210):

Design considerations: DJLDiscussionInputVersion4.pdf

Opt-in: SwitchGarbageCollection.on();

The implementation here is done for PyTorch, but the solution approach is general.

# Conflicts: # engines/pytorch/pytorch-engine/src/main/java/ai/djl/pytorch/engine/PtNDArray.java

codecov-commenter · 2022-12-31T07:29:33Z

Codecov Report

Base: 72.08% // Head: 74.14% // Increases project coverage by +2.05% 🎉

Coverage data is based on head (6cf606e) compared to base (bb5073f).
Patch coverage: 72.26% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2273      +/-   ##
============================================
+ Coverage     72.08%   74.14%   +2.05%     
- Complexity     5126     6738    +1612     
============================================
  Files           473      665     +192     
  Lines         21970    29333    +7363     
  Branches       2351     3033     +682     
============================================
+ Hits          15838    21750    +5912     
- Misses         4925     6109    +1184     
- Partials       1207     1474     +267

Impacted Files	Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java	`69.23% <ø> (-4.11%)`	⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java	`76.00% <ø> (ø)`
...rc/main/java/ai/djl/modality/cv/output/Joints.java	`71.42% <ø> (ø)`
.../main/java/ai/djl/modality/cv/output/Landmark.java	`100.00% <ø> (ø)`
...main/java/ai/djl/modality/cv/output/Rectangle.java	`72.41% <0.00%> (ø)`
...i/djl/modality/cv/translator/BigGANTranslator.java	`21.42% <0.00%> (-5.24%)`	⬇️
.../modality/cv/translator/ImageFeatureExtractor.java	`0.00% <0.00%> (ø)`
.../ai/djl/modality/cv/translator/YoloTranslator.java	`27.77% <0.00%> (+18.95%)`	⬆️
...ain/java/ai/djl/modality/cv/util/NDImageUtils.java	`67.10% <0.00%> (+7.89%)`	⬆️
api/src/main/java/ai/djl/modality/nlp/Decoder.java	`63.63% <ø> (ø)`
... and 607 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

lanking520 · 2022-12-31T07:53:14Z

This approach is very similar to reference counting, which is used widely in C++. Here I just leave a reference implementation: https://github.com/almson/almson-refcount will take a look after holiday.

Thanks for your contribution! A great mile to explore something alternative. But I would try to benchmark in concurrent mode, to see what happened in parallel block

lanking520 · 2022-12-31T07:57:13Z

Another concern is the implementation on the reference queue: seemed all operation is involved with a synchronous checkQueue operation, what is the impact if we frequently adding/removing resources? How much chances we will be blocked by that

lanking520 · 2022-12-31T08:02:52Z

In the proxyMaker implementation you used a UUID random generation (synchronize call) and not picking up the uid. UID is also unique across the resources

enpasos · 2022-12-31T08:55:14Z

Another concern is the implementation on the reference queue: seemed all operation is involved with a synchronous checkQueue operation, what is the impact if we frequently adding/removing resources? How much chances we will be blocked by that

On the ReferenceQueue there are two fast methods enqueue and poll. enqueue is called once per reference, poll on every method call. The poll should normally be ultra fast - if no head is on the queue it immediately returns. Blocking could in principle occur in a multithreading environment for the time it takes to handle all the References that are on the queue at a particular method call. But are the NDArrays for PyTorch threadsafe? I have the following comment on #2210 in mind:

To be on the safe side, should there not in any case be a synchronizing facade in front of the PyTorch calls? And if so there will not be an extra blocking on the reference queue.

enpasos · 2022-12-31T09:12:55Z

In the proxyMaker implementation you used a UUID random generation (synchronize call) and not picking up the uid. UID is also unique across the resources

I'm not sure if I got your concern right. It is not necessary to use an UUID here - an Integer used right - would also do. This key just needs to be an object that is unique within the PtNDArrayProxyMaker. It gets garbage collected together with the proxy when the user has no more references to the proxy while there still is a reference from the NDManager to the corresponding value PtNDArrayImpl. If your concern is about the particular UID implementation I could replace it with an Integer counter.

Maybe, if your intention is to use the UID uid = handle.toString(); that is used inside PtNDArrayImpl: One can certainly use a copy of the String but not the original String object, because it is referenced.

edited 04.01.2023: I replaced the UUID by the uid+"-"+counter. Only the uid from PtNDArrayImpl won't do because the lifespan of the handle and the PtNDArrayImpl are not exactly the same.

…GCMap, zeroGrad

enpasos · 2023-01-08T18:19:40Z

@frankfliu Thx a lot for looking into the suggested solution. I will try my best to solve the issues you mentioned tomorrow.

One comment on the threading issue: If I understand you right it is necessary to have the closing of an NDArray done by the same thread as the creation. If so I would replace the global Queue by a ThreadLocal Queue.

One more point for now: I must confess that in the heat of the moment during the last merges with the master branch I didn't pay attention to my memory leak test ... I'll try to fix that tomorrow as well.

frankfliu · 2023-01-08T18:22:01Z

ThreadLocal has its own issue. It's hard to clean up ThreadLocal memory. We do see some customer create and distroy thread a lot.

enpasos · 2023-01-08T18:27:36Z

ThreadLocal has its own issue. It's hard to clean up ThreadLocal memory. We do see some customer create and distroy thread a lot.

What would you suggest? Give it a try? ... I'll give it a try and keep your comment in mind.

This reverts commit 7866a4b.

enpasos · 2023-01-09T12:59:01Z

One more point for now: I must confess that in the heat of the moment during the last merges with the master branch I didn't pay attention to my memory leak test ... I'll try to fix that tomorrow as well.

I repaired the merge (everythink merged but "A temporary solution to issue 2210 (#2304)"). Memory leak test works now. I will now try to solve the issues mentioned in the comments.

…javalibrary#2231)" This reverts commit e3ad95f.

enpasos · 2023-01-09T15:19:27Z

ThreadLocal has its own issue. It's hard to clean up ThreadLocal memory. We do see some customer create and distroy thread a lot.

What would you suggest? Give it a try? ... I'll give it a try and keep your comment in mind.

I changed the code to use threadLocal queues.
In PtNDArrayProxyMaker a ThreadLocal holds the WeakHashMapWrappers for all the threads.
A smoketest with 1000 threads does not reveal any problems.

…ct (deepjavalibrary#2231)"" This reverts commit be300cc.

frankfliu · 2023-01-11T02:17:53Z

I looked a bit more into this PR, I don't think this solution can fly:

clean up NDArray relies on calling other NDArray's methods. If the code only create NDArray but not calling any operators, the NDArray won't be closed
The test code actually doesn't really work:
1. mapSize() doesn't work as expected, when GC happened, the size won't be updated
2. checkQueue() is not actually called, it triggered because the code manually invoked mapSize(), this won't happen in real world.

One workaround could be adding a NDManager.gc() function and let user manually call it when needed, it scans the current thread's orphan NDArrays and close them. But I personally don't like NDManager.gc() solution. A better alternative would be implement NDScope class:

try (NDScope scope = new NDScope()) {
     // Create NDArrays
} // NDArrays created in this scope will be closed automatically

enpasos · 2023-01-11T08:04:19Z

The test code actually doesn't really work:
1. mapSize() doesn't work as expected, when GC happened, the size won't be updated
2. checkQueue() is not actually called, it triggered because the code manually invoked mapSize(), this won't happen in real world.

Looks like you looked at ai.djl.pytorch.integration.gc.Main. This Class was actually just a very basic smoke test for me when setting up the solution initially. I would rather delete it from this PR and set up a test suite #2290 following the tests in #2210.

However, the ai.djl.pytorch.integration.gc.Main still works as expected for me:
When I run it I see

> Task :engines:pytorch:pytorch-engine:Main.main()
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of inter-op threads is 8
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of intra-op threads is 8
[main] INFO ai.djl.pytorch.integration.gc.Main - number of NDArrays in NDManager hierarchy 3
[main] INFO ai.djl.pytorch.integration.gc.Main - number of NDArrays in map used of gc triggered NDArray closing 3
[main] INFO ai.djl.pytorch.integration.gc.Main - reference exists ...
[main] INFO ai.djl.pytorch.integration.gc.Main - weakHashMap size: 3
[main] INFO ai.djl.pytorch.integration.gc.Main - no reference exists, but likely not yet garbage collected ...
[main] INFO ai.djl.pytorch.integration.gc.Main - weakHashMap size: 3
\--- NDManager(e8ac5d45a276, gpu(0)) resource count: 1
    \--- NDManager(e847df9e9799, gpu(0)) resource count: 1
        \--- NDManager(db5ef3d00e38, gpu(0)) resource count: 3
            \--- NDArray(2057224517712, Shape(1))
            \--- NDArray(2057224517184, Shape(1))
            \--- NDArray(2057224517632, Shape(1))
[main] INFO ai.djl.pytorch.integration.gc.Main - no reference exists, and likely garbage collected ...
[main] INFO ai.djl.pytorch.integration.gc.Main - weakHashMap size: 0
\--- NDManager(e8ac5d45a276, gpu(0)) resource count: 1
    \--- NDManager(e847df9e9799, gpu(0)) resource count: 1
        \--- NDManager(db5ef3d00e38, gpu(0)) resource count: 0
\--- NDManager(e8ac5d45a276, gpu(0)) resource count: 1
    \--- NDManager(e847df9e9799, gpu(0)) resource count: 0
\--- NDManager(e8ac5d45a276, gpu(0)) resource count: 0

This test starts creating three NDArrays a,b,c. Because SwitchGarbageCollection.on(); the PtNDArrayImpl are wrapped with a dynamic proxy. The user references are actually to the dynamic proxy but the NDManager holds the PtNDArrayImpl. -> weakHashMap size: 3

When a,b,c are no longer referenced the dynamic proxies and the uid key are ready to be garbage collected. However, we need to wait for the next minor GC. Therefore still -> weakHashMap size: 3

To see a GC we do something we would not do in an application, we encourage GC and wait for 1 s. And in this case checkQueue got three messages from GC and closes the corresponding NDArrays. -> weakHashMap size: 0
I completely agree that calling checkQueue from size() in this example is something that would not happen in real life.

If I find time today I will set up an example for the testsuite.

For me, SwitchGarbageCollection.on(); makes a huge, positive difference in the application I run.

enpasos · 2023-01-11T08:32:33Z

...
1. clean up NDArray relies on calling other NDArray's methods. If the code only create NDArray but not calling any operators, the NDArray won't be closed
One workaround could be adding a NDManager.gc() function and let user manually call it when needed, it scans the current thread's orphan NDArrays and close them. But I personally don't like NDManager.gc() solution. A better alternative would be implement NDScope class:
try (NDScope scope = new NDScope()) {
     // Create NDArrays
} // NDArrays created in this scope will be closed automatically

If SwitchGarbageCollection.on(); is not called there could be NDArrays that are not closed: a leak. If you suffer from such a situation SwitchGarbageCollection.on(); could help by getting rid of the leaked NDArrays. I agree that the suggested solution relys on a method call to an NDArray - anyone but on the same thread - to do any cleanup. As you suggested one could complement this 'automatic process' with a manual trigger that the user may call in case the 'automatic trigger' does not apply. I could add a suggestion for your NDManager.gc().

Do you think that the NDScope approach works together with the GC?
For me it looks like a different approach ... would it fit together with the existing NDManager hierarchy?

… WeakHashMapWrapper

enpasos · 2023-01-11T19:39:17Z

I added a method gc() to NDManager that explicitly calls checkQueue on WeakHashMapWrapper. Just to show this function, I added a small Main2 corresponding to Main. Both classes should be deleted. A test suite #2290 should provide realistic test cases.

enpasos · 2023-01-11T21:51:12Z

As a starting point for a test suite #2290 I pasted some examples from the DJL Docs into

git clone https://github.com/enpasos/djlperformancetesting.git

and tested for duration and memory:

enpasos · 2023-01-12T02:58:37Z

I thought about your

A better alternative would be implement NDScope class:

try (NDScope scope = new NDScope()) {
     // Create NDArrays
} // NDArrays created in this scope will be closed automatically

I think you could have this approach in principle without GC and without NDManager hierarchy.
You could add a mechanism to move an object from one NDScope to another enclosing or nested NDScope.
You could have full control over the lifecycle of the NDArray's part that lives on the devices.
One price you pay for this would be the clinch with the variable scope in Java. Suppose you have a long-lived variable and in an NDScope you create a new NDArray that is automatically attached to the variable while you are doing a calculation where you think of the value of the variable rather than the object. When you exit the NDScope, this new NDArray is closed, the Java object NDArray still exists, and the variable points to it. The next time you use the variable, where you want to work with the value, you get the equivalent to a NullPointerException. The compiler would not help you to find such errors. Maybe it is still better than getting memory leaks in running applications.

frankfliu · 2023-01-12T05:54:49Z

@enpasos

I don't like indeterministic behavior, it gives developer false impression that it works like GC
NDManager.gc() only looks better to me, but it also a bit misleading, it only works for current thread, need document the behavior
NDScope can have nested NDScope, the behavior has been implemented in Javacpp's PointerScope and it works pretty nice
NDScope can be combined with NDManager hierarchy
However NDScope is kind of redundant with NDManager.gc() solution, they can co-exist, but will cause confusion.

enpasos · 2023-01-12T07:41:46Z

@frankfliu Thx a lot for looking into this option. I totally agree that deterministic behavior is very desirable.

enpasos · 2023-01-14T14:26:27Z

I am closing this PR in favour of the deterministic solution #2321.

enpasos added 20 commits December 15, 2022 11:01

just the poc, compiles without testing

1552f96

moved PtNDArray interface out

52cbaf6

creating NDArrays with or without proxy

5d67a61

build without test, poc uses switch successfully

396da57

build without test and publishesToMavenLocal

f49c4b4

some proxy handling fixes

8b06735

removed a logging

6fc1879

fixed double wrapping bug

7204586

catch exception silently if resource already closed

8b52b01

methods to remove NDManager without resources

335586c

renamed the switch to garbageCollectionOn

320ff1d

add switch for gc in model

e0da7f3

common name for the switch

811bce8

variable details on debugDump

043592e

fixed a memory leak

e79c553

global switch

5165335

Merge remote-tracking branch 'origin/master' into gc-orphaned-resources

bdc18a6

# Conflicts: # engines/pytorch/pytorch-engine/src/main/java/ai/djl/pytorch/engine/PtNDArray.java

sync fork

66a613a

opened LayerNorm.Builder for inheritance

687e307

Merge remote-tracking branch 'origin/master' into gc-orphaned-resources

7062e5c

enpasos requested review from zachgk and frankfliu as code owners December 31, 2022 07:08

enpasos mentioned this pull request Jan 4, 2023

memory leak on dataset iteration #2289

Closed

uid-counter, getImplementation, debugCountNDArrays,getNumOfNDArraysIn…

75bb5f2

…GCMap, zeroGrad

Merge remote-tracking branch 'origin/master' into gc-orphaned-resources

c2889a1

enpasos added 3 commits January 9, 2023 07:57

Merge remote-tracking branch 'origin/master' into gc-orphaned-resources

ed138be

Revert "A temporary solution to issue 2210 (deepjavalibrary#2304)"

13241f9

This reverts commit 7866a4b.

fixed bug: ignore if array is already closed

664210c

enpasos added 3 commits January 9, 2023 14:09

Revert "some inheritance opening needed in a particular project (deep…

be300cc

…javalibrary#2231)" This reverts commit e3ad95f.

removed two added methods from NDManager that are not necessary

b30c95a

introduced threadLocal reference queues

a181614

enpasos added 3 commits January 9, 2023 16:54

Revert "Revert "some inheritance opening needed in a particular proje…

e30b771

…ct (deepjavalibrary#2231)"" This reverts commit be300cc.

here I reverted to much

9b2f9bb

Merge remote-tracking branch 'origin/master' into gc-orphaned-resources

dc5ce14

added a method gc() to NDManager which explicitly calls checkQueue on…

6cf606e

… WeakHashMapWrapper

enpasos marked this pull request as draft January 12, 2023 07:42

enpasos mentioned this pull request Jan 14, 2023

Implements NDScope to automatically close NDArray in the scope #2321

Merged

enpasos closed this Jan 14, 2023

KexinFeng mentioned this pull request Apr 26, 2023

[Feature request] Memory management and garbage collection #2567

Closed

KexinFeng mentioned this pull request Oct 5, 2023

Fix memory leak in zeroGradients() #2792

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optionally close orphaned NDArrays using Java garbage collection #2273

optionally close orphaned NDArrays using Java garbage collection #2273

enpasos commented Dec 31, 2022

codecov-commenter commented Dec 31, 2022 •

edited

Loading

lanking520 commented Dec 31, 2022

lanking520 commented Dec 31, 2022

lanking520 commented Dec 31, 2022

enpasos commented Dec 31, 2022 •

edited

Loading

enpasos commented Dec 31, 2022 •

edited

Loading

enpasos commented Jan 8, 2023

frankfliu commented Jan 8, 2023

enpasos commented Jan 8, 2023 •

edited

Loading

enpasos commented Jan 9, 2023

enpasos commented Jan 9, 2023

frankfliu commented Jan 11, 2023 •

edited

Loading

enpasos commented Jan 11, 2023

enpasos commented Jan 11, 2023 •

edited

Loading

enpasos commented Jan 11, 2023

enpasos commented Jan 11, 2023

enpasos commented Jan 12, 2023

frankfliu commented Jan 12, 2023

enpasos commented Jan 12, 2023

enpasos commented Jan 14, 2023

optionally close orphaned NDArrays using Java garbage collection #2273

optionally close orphaned NDArrays using Java garbage collection #2273

Conversation

enpasos commented Dec 31, 2022

Description

codecov-commenter commented Dec 31, 2022 • edited Loading

Codecov Report

lanking520 commented Dec 31, 2022

lanking520 commented Dec 31, 2022

lanking520 commented Dec 31, 2022

enpasos commented Dec 31, 2022 • edited Loading

enpasos commented Dec 31, 2022 • edited Loading

enpasos commented Jan 8, 2023

frankfliu commented Jan 8, 2023

enpasos commented Jan 8, 2023 • edited Loading

enpasos commented Jan 9, 2023

enpasos commented Jan 9, 2023

frankfliu commented Jan 11, 2023 • edited Loading

enpasos commented Jan 11, 2023

enpasos commented Jan 11, 2023 • edited Loading

enpasos commented Jan 11, 2023

enpasos commented Jan 11, 2023

enpasos commented Jan 12, 2023

frankfliu commented Jan 12, 2023

enpasos commented Jan 12, 2023

enpasos commented Jan 14, 2023

codecov-commenter commented Dec 31, 2022 •

edited

Loading

enpasos commented Dec 31, 2022 •

edited

Loading

enpasos commented Dec 31, 2022 •

edited

Loading

enpasos commented Jan 8, 2023 •

edited

Loading

frankfliu commented Jan 11, 2023 •

edited

Loading

enpasos commented Jan 11, 2023 •

edited

Loading