-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Possibly-incorrect handling of duplicate parameters #4521
Comments
Now that the work for #4226, the provided example code is no longer possible in the R package. library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.train(
params = list(
objective = "regression",
bagging_fraction = 0.8
)
, data = dtrain
, nrounds = 5L
, bagging_fraction = 0.5
)
However, adding the same parameter to model <- lgb.train(
params = list(
objective = "regression"
, bagging_fraction = 0.8
, bagging_fraction = 0.5
)
, data = dtrain
, nrounds = 5L
)
Looking into this more tonight, I think this is because of how the internal function used to to convert params <- list(
objective = "regression"
, bagging_fraction = 0.8
, bagging_fraction = 0.5
)
lightgbm:::lgb.params2str(params)
I'll propose a fix to |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
In the CLI, if you provide a training config with
bagging_fraction=0.8
andbagging_fraction=0.5
, you will see a warning like this during training.If you provide a config with those duplicate values to
lgb.train()
in the R package, you'll see this instead:I need to do more investigation, but I think this means that the R package is not handling duplicates in parameters correctly.
Reproducible example
This behavior cannot be reproduced in the Python package because Python code in this project uses dictionaries for parameters at all user-facing entrypoints and Python dictionaries enforce the uniqueness of keys.
However, the R package uses R lists, which can have duplicate keys (e.g.
list(a = 5, a = 6)
).With the CLI, the same duplicated parameters results in the expected warning message.
Environment info
LightGBM version or commit hash: latest
master
(86ead20)Command(s) you used to install LightGBM
Additional Comments
This is probably related to the R package's use of
append()
to handle parameters passed through...
(R's equivalent of**kwargs
in Python). Combining two lists in R withappend()
doesn't merge keys.It's possible that changing these uses to
modifyList()
would help with this issue.In Python package, in the scikit-learn interface, parameters passed through
**kwargs
take precedence over those passed in theparams
dictionary.LightGBM/python-package/lightgbm/sklearn.py
Line 517 in 86ead20
LightGBM/python-package/lightgbm/sklearn.py
Lines 562 to 566 in 86ead20
However, since the current behavior of the R package is for parameters in
params
to take precedence over those from...
, I think that that behavior should be preserved in the 3.3.0 release (#4310). All uses of...
are going to be removed in LightGBM 4.0.0 (#4226).The text was updated successfully, but these errors were encountered: