-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify the block that PtGradientCollector belongs to #2232
Conversation
Codecov ReportBase: 72.08% // Head: 71.62% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2232 +/- ##
============================================
- Coverage 72.08% 71.62% -0.47%
- Complexity 5126 6373 +1247
============================================
Files 473 631 +158
Lines 21970 28172 +6202
Branches 2351 2998 +647
============================================
+ Hits 15838 20179 +4341
- Misses 4925 6523 +1598
- Partials 1207 1470 +263
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
engines/ml/xgboost/src/main/java/ai/djl/ml/xgboost/XgbEngine.java
Outdated
Show resolved
Hide resolved
try (GradientCollector gc = Engine.getInstance().newGradientCollector()) { | ||
NDArray b = a.mul(2); | ||
gc.backward(b); | ||
try (GradientCollector gc = Engine.getInstance().newGradientCollector(block)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we now support both newGradientCollector()
and newGradientCollector(block)
, we should probably have tests for both of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test of newGradientCollector(block) has been covered in testClearGradients
. Do we need to add it or do we change the name of this test to be, e.g. testNewGradientCollectorWithClearGradients?
|
||
zeroGradients(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still want the zeroGradients here. Otherwise, it won't work correctly for cases without a block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason of removing it was mainly due to efficiency and memory issue #2210.
Should we add this back and then do
public GradientCollector newGradientCollector() {
return manager.getEngine().newGradientCollector(getModel().getBlock());
}
Does this avoid the global search scope issue?
Update:
This code does not seem to work inside PtEngine.java or other Engine file.
* @return a new instance of {@link GradientCollector} | ||
*/ | ||
public abstract GradientCollector newGradientCollector(Block block); | ||
|
||
/** | ||
* Returns a new instance of {@link ParameterServer}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update trainer.newGradientCollector()
to use this? You can change it's definition to:
public GradientCollector newGradientCollector() {
return manager.getEngine().newGradientCollector(getModel().getBlock());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky part is that inside the newGradientCollector()
, the zeroGradient
is not called, while inside newGradientCollector(Block block)
, it is. This is to avoid the efficiency and memory leak issue with global search scope as discussed last week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another consideration of not using
public GradientCollector newGradientCollector() {
return manager.getEngine().newGradientCollector(getModel().getBlock());
}
is that the newGradientCollector()
is definted to create a global GradientCollector, so the block is not specified.
Update:
public GradientCollector newGradientCollector() {
return manager.getEngine().newGradientCollector(getModel().getBlock());
}
This code does not seem to work inside PtEngine.java or other Engine file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts after looking into code and testing:
- I follow @zachgk argument that we need the
zeroGradients()
functionality when we get a new GradientCollector in any case. I don't see that we can remove it. - The time spend in
zeroGradients()
goes into getting all the NDArrays (see JProfiler snapshot). Therefore if not reducing the number of existing NDArrays before collecting the idea of knowing the relevant NDArrays forzeroGradients()
would be an option to minder/remove the negative performance effect
. However, it is not removing the root cause. - As @zachgk mentioned ... in case you go with the approach it must also work when using a trainer.
- Minor point: As there is the native
JniUtils.zeroGrad((PtNDArray) array);
. Couldn't it replacearray.getGradient().subi(array.getGradient());
? - Are you sure that you get all relevant NDArrays with Gradient by getting the parameter NDArrays from block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @zachgk mentioned ... in case you go with the approach it must also work when using a trainer.
That would be ideal. But it has the efficient and memory issue. So we should disable it for now.
Are you sure that you get all relevant NDArrays with Gradient by getting the parameter NDArrays from block?
The gradientCollector is specified to a block. By defination it searches only within a block.
The zeroGradients() call (added in #2101 to solve #2024) when constructing PtGradientCollector() compromises the efficiency and memory usage, as shown in #2210. Thus the block has to be specified so that the zeroGradients is called only on its parameters, and the search scope is narrowed down. Another relevant PR is #2111.
The original PtGradientCollector() is left unchanged in order to be backward compatible. A new PtGradientCollector(Block) is added so that the gradient collector is restricted to operate on a certain block. Then multiple such collectors can coexist. @zachgk I added this point to the doc, please take a look. The relevant PR #2111.
In essence, this PR is to disable
zeroGradients()
in the default case, unless the block is specified when creating a GradientCollector. This is because of the issues inside global search in zeroGradients.To solve this issue the issue #2024 in the Dive into Deep Learning book, how the gradient is accumulated should be specified in the future, either "write" or "add". Also "retain_graph" should also be fixed. Under certain condition error should be thrown, for example, if retain_graph=false and repetative backward is called. This way the ambiguity in the defination of repetative backward call, as shown in #2024, will be clarified.
On pytorch, by default gradients are added for repetative backward call and if retain_graph=true
Output:
Note:
retain_graph=True
should be set. This is not an commenly seen usecase in practice. See also Test accumulating gradient collector #2111.Block
currently does not have a direct call feature like the following but requiresparameterStore
.Relavant stackoverflow issue:
https://stackoverflow.com/questions/70998652/djl-gradientcollector-try-with-resources-initialiser-error