-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost - hist + learning_rate decay memory usage #3579
Comments
Is this problem confined to |
I tried using the
I did not try the exact method. |
Memory leakage may be probable. Let me look at it after 0.80 release. |
I've had this issue before. I don't know exactly what is happening, but I found a workaround. While studying it I found that the learning_rates parameter in xgb.train actually calls a reset_learning_rate Callback. Then I tried using other custom Callbacks and I saw this memory leak as well. It looks as if once you call any Callback other than the print Callback, it causes the tree to re-initialize at every iteration. My workaround was to add a "learning_rate_schedule" dmc parameter and then set the new learning rate in the at the beginning of each iteration. It involved quite a bit of modification of the c++ code. Also, I saw this problem in gpu_hist as well. So, I edited the cuda code too. In the end my solution resets the learning rate without Callbacks. And it works. |
@hcho3 0.80 is released, did you have a chance to look at this leakage? @Denisevi4 can you share the code for that? |
@Denisevi4 For the CUDA gpu_hist, did you find |
@dev7mt @Denisevi4 @kretes @trivialfis I think I found the cause of the memory leak. When the learning rate decay is enabled, I'll try to come up with a fix so that |
Here is a snippet of diagnostic logs I injected. Learning rate decay enabled:
Learning rate decay disabled
|
Diagnosis The learning rate decay callback function calls |
**Diagnosis** The learning rate callback function calls `XGBoosterSetParam()` to update the learning rate. The `XGBoosterSetParam()` function in turn calls `Learner::Configure()`, which resets and re-initializes each tree updater, calling `FastHistMaker::Init()`. The `FastHistMaker::Init()` function in turn re-allocates internal objects that were meant to be recycled across iterations. Thus memory usage increases over time. **Fix** The learning rate callback should call a new function `XGBoosterUpdateParamInPlace()`. The new function is designed so that no object is re-allocated.
@dev7mt @Denisevi4 @kretes @trivialfis Fix is available at #3803. |
**Diagnosis** The learning rate callback function calls `XGBoosterSetParam()` to update the learning rate. The `XGBoosterSetParam()` function in turn calls `Learner::Configure()`, which resets and re-initializes each tree updater, calling `FastHistMaker::Init()`. The `FastHistMaker::Init()` function in turn re-allocates internal objects that were meant to be recycled across iterations. Thus memory usage increases over time. **Fix** The learning rate callback should call a new function `XGBoosterUpdateParamInPlace()`. The new function is designed so that no object is re-allocated.
@dev7mt @Denisevi4 @kretes The next upcoming release (version 0.81) will not include a fix for the memory leak issue. The reason is that the fix is only temporary, adds a lot of maintenance burden, and it will be supplanted by a future code re-factor. For now, you should use |
Could you be more specific about which object? I'm trying to do parameter update, may just fix this on the way... |
Closes dmlc#3579 .
Hey,
I have been trying to implement in my project eta_decay that's quite specific to my needs, but I kept running into OutOfMemory errors. After a bit of digging, I've found out, that setting learning rate, while using the "hist" tree_method seems to cause the same issue. It led me to believe that the callback itself is not the problem here.
I have tested this issue on multiple environments (two different setups of Ubuntu - on premise and cloud and macOS), and it always produced similar errors.
The code below should reproduce the issue:
Attached plot from my run of the code above.
I did no digging into the underlying code (cpp), but a memory leakage seems plausible.
As I understand this is not the desired behaviour, but maybe this method requires such amounts of memory.
The text was updated successfully, but these errors were encountered: