Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refit in Python does not support weights #3038

Closed
jtilly opened this issue Apr 29, 2020 · 5 comments · Fixed by #4894
Closed

refit in Python does not support weights #3038

jtilly opened this issue Apr 29, 2020 · 5 comments · Fixed by #4894

Comments

@jtilly
Copy link
Contributor

jtilly commented Apr 29, 2020

Summary

The refit method in Python does not support weights (or in fact anything other than data and labels). This is because here, the training data set gets created using:

train_set = Dataset(data, label, silent=True)

It would be great if refit accepted additional arguments (or kwargs) specifically for the Dataset call.

Passing in the train_set directly would also be an option. But since we need to predict first, we would also need to pass the training data in as data frame, which is not so nice.

Motivation

I'm using re-fit in an application where weights are very important.

I'm happy to open a PR if this sounds useful.

@StrikerRUS
Copy link
Collaborator

Please refer to #1629 (comment).

Actually, only the X and label are used in refit task, other fields like weight, group are not used.

@jtilly
Copy link
Contributor Author

jtilly commented Apr 30, 2020

Thanks for pointing me to the comment, but weight does get used in refit (also init_score, etc).

Example: tree with three observations that makes one split. Using the weights, I can determine the value of the leaf for the bigger group.

import numpy as np
import lightgbm

X = np.array([1, 2, 2]).reshape((3, 1))
label = np.array([1, 2, 3])

data = lightgbm.basic.Dataset(X, label)

booster = lightgbm.engine.train(
    {
        "min_data_in_bin": 1,
        "min_data_in_leaf": 1,
        "learning_rate": 1,
        "boost_from_average": False,
    },
    data,
    num_boost_round=2,
)

booster.predict(X)
# array([1. , 2.5, 2.5])

# let's refit (to make sure it works)
booster_refit = booster.refit(X, label, decay_rate=0.0)
booster_refit.predict(X)
# array([1. , 2.5, 2.5])

# use weights (I added data_set_kwargs)
booster_refit = booster.refit(
    X, label, decay_rate=0.0, data_set_kwargs={"weight": np.array([1.0, 0.0, 1.0])}
)
booster_refit.predict(X)
# array([1., 3., 3.])

booster_refit = booster.refit(
    X, label, decay_rate=0.0, data_set_kwargs={"weight": np.array([1.0, 1.0, 0.0])}
)
booster_refit.predict(X)
# array([1., 2., 2.])

@guolinke
Copy link
Collaborator

guolinke commented Aug 6, 2020

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute to this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@guolinke guolinke closed this as completed Aug 6, 2020
@TremaMiguel
Copy link
Contributor

I'll take this to work, here is the plan I'll follow, any changes to it are welcome.

  • Context:
    refit method accepts kwargs argument passed to predict method. While Dataset accepts weight parameter, among others, refit method can be changed to support additional arguments to pass to Dataset.

  • Solution:
    refit method accept a kwargs_for_dataset parameter to pass weight parameter to Dataset initialization here.
    refit method accepts a kwargs_for_predict parameter to pass original params to predict method here.

  • Tests:
    add new test refit case to test the new implementation as this one test_refit

  • Documentation:
    document parameter changes here

@StrikerRUS StrikerRUS reopened this Dec 16, 2021
StrikerRUS pushed a commit that referenced this issue Jan 22, 2022
…t() (fixes #3038) (#4894)

* feat: refit additional kwargs for dataset and predict

* test: kwargs for refit method

* fix: __init__ got multiple values for argument

* fix: pycodestyle E302 error

* refactor: dataset_params to avoid breaking change

* refactor: expose all Dataset params in refit

* feat: dataset_params updates new_params

* fix: remove unnecessary params to test

* test: parameters input are the same

* docs: address StrikeRUS changes

* test: refit test changes in train dataset

* test: set init_score and decay_rate to zero
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants