-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add init_score attr to python booster class. #4235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JoshuaC3 ! I've read the attached issue and the other stuff it links to so I think I understand what this is trying to accomplish, but I'm concerned.
init_score
can be an array with shape (n_observations, )
or even (n_observations, n_classes)
for multiclass classification (docs), so this could be a fairly large amount of data. This might substantially increase the memory footprint of the Booster object if you used init_score
during training, which would have at least two negative side effects:
- increases the memory requirements for deploying a LightGBM model
- increases the amount of data that needs to be sent over the wire from Dask workers back to the client in distributed training with
lightgbm.dask
(https://github.com/microsoft/LightGBM/blob/d5c2c55682cdf52b9371dbead9bd551e2acbffdc/python-package/lightgbm/dask.py)
Is it absolutely necessary for #3905?
I'm just leaving a "Comment" review to bring this to the attention of other reviewers, for their consideration.
Thanks for reviewing @jameslamb. Yes, I was concerned about this as well. It certainly isn't essential and is somewhat of a workaround for the following: #4234 (which would be, or at least some variant to create an "intercept"). That said, I do think Would a parameter such as |
I'm personally opposed to adding such a flag to the Booster object for this purpose. I think anyone choosing to set that flag could just as easily choose to separately save However, that's just one opinion! We should hear what @shiyu1994 and @StrikerRUS think. |
Thanks @JoshuaC3 for your work. Keep the init_score available can be useful in some cases. But I have the following concerns,
So to make us closer to the interpretability goal in #3905, I think we should focus on the |
@shiyu1994 I think you do understand my plan, and maybe a little better than I did. Your comment has made it clear now! You point 3. is spot on and highlights one of the reasons why I tried to access The other reason was, often we don't wish to boost from "average", but wish to boost from some other value instead: min, max, median, 20th quantile, etc. This is because EBMs mean-centre the feature-scores in favour of the intercept being the mean of the target, thus it is rarely is the mean. Potentially superseding Is it worth copying this to an issue/feature request for further discussion and closing this PR? |
@JoshuaC3 OK. So the final goal is to provide the EBM style interpretability.
That would require modifying the prediction logic. When prediction starts from the first tree, we should add back the average manually. A simpler work around would be to make the average score available to the Python API for some cases the user want to know its value, but without removing the average from the first tree. And the EBM style interpretability can be calculated internally by subtracting in C++ or Python part. With the average being stored, it won't be difficult to calculate the EBM style interpretability. In other words, we subtract the average from the model only when calculating the EBM style interpretability, but leave the prediction logic and model file unchanged. Do you think which strategy better suits the goal? |
My gut feeling is that the first proposal - modifying the prediction logic - seems more robust and more easily used by other languages, not just Python. That said, option two means fewer changes to the prediction logic. I have created some examples in this notebook on my gist. The EBM-FI calculation and Plots is current written in Python and is a little slow - I am sure it could be sped up a lot though! However, it doesn't seem to make much difference if we subtract the mean before training, at the start of training or after training, so this option seems OK. Pay special attention to I think compared to this Should I close this PR and open an issue detailing this? |
@JoshuaC3 Sure! That would be great. |
Close in favour of #4313. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Save
init_score
to the python booster class so that it is accessible after training, after saving and as part of model interpretability: #4065