-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] fix dangerous default for eval_at in LGBMRanker #3377
Conversation
Yeah, I've noticed that too. |
yeah I agree! Honestly even if we ignore 85% of the warnings that are style-only, it can still be worth it for some of these things that affect the correctness of the code. I'll try a PR this weekend, let's see how bad it looks |
but to be clear it will be separate from this PR |
python-package/lightgbm/sklearn.py
Outdated
feature_name='auto', categorical_feature='auto', | ||
callbacks=None, init_model=None): | ||
"""Docstring is inherited from the LGBMModel.""" | ||
# check group data | ||
eval_at = eval_at or [1, 2, 3, 4, 5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scikit-learn doesn't like when we modify fit
arguments: #3254 (comment).
I believe changing default value from list
to tuple
([1, 2, 3, 4, 5] -> (1, 2, 3, 4, 5)
) will be enough and will not break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, the behavior is identical. Just the way we achieve it is different. I followed that comment and then the PR it linked to and I'm not sure that the change I'm proposing would actually break anything.
I'm nervous that changing data types will break something, because now you have the default as a tuple and users probably passing in values as a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nervous that changing data types will break something, because now you have the default as a tuple and users probably passing in values as a list
That pylint
warning you are addressing in this PR is about dangerous default values. There should be nothing wrong with that users pass lists intentionally.
I don't really like you approach because with it default value is implicit: documentation says it is 1,2,3,4,5
but actually it is None
. It may be misleading.
Also, scikit-learn prefers everything to be explicit. It is not allowed to change init
args or even have kwargs
, for example. So I'm pretty sure it is better to not have implicit argument overwriting for avoiding any possible (future) issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. To be honest I still don't understand the scikit-learn thing but I believe you.
There should be nothing wrong with that users pass lists intentionally.
Using a tuple as the argument means that if we have any code anywhere down the stack that is expecting specifically a list and not a tuple (for example, something that does eval_at.append()
), it will break. But let me try and see what the tests say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I forgot about this one! Just updated to a tuple. Hopefully the tests will still pass. I searched around with git grep eval_at
and didn't find any list-specific code so I think it should work.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
I ran
pylint
over the Python package tonight, and found some tiny warnings that I think are worth addressing.This PR fixes the following:
Using literals of mutable data structures like
[]
or{}
is dangerous because Python initializes them and then every call gets a reference to the same object.Like this
Question for Reviewers
@StrikerRUS what do you think about running
pylint
in CI to catch things like this? I have found that it catches things like this thatpycodestyle
doesn't catch, and which can affect the correctness of the code. It's also a very lightweight dependency that runs quickly, so it shouldn't make thelint
task on Travis too much slower or fragile.If you're open to this, I'll open a separate PR to add
pylint
to thelint
task on Travis. We could also be very aggressive about ignoring style items to focus only on things that impact the correctness of the code, like this: https://github.com/jameslamb/doppel-cli/blob/84eb4c1973d96f7bbc742ee0b5231e314740ce1d/.ci/lint-py.sh#L54-L56