Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] fix dangerous default for eval_at in LGBMRanker #3377

Merged
merged 5 commits into from
Sep 29, 2020

Conversation

jameslamb
Copy link
Collaborator

I ran pylint over the Python package tonight, and found some tiny warnings that I think are worth addressing.

This PR fixes the following:

python-package/lightgbm/sklearn.py:944:4: W0102: Dangerous default value [] as argument (dangerous-default-value)

Using literals of mutable data structures like [] or {} is dangerous because Python initializes them and then every call gets a reference to the same object.

Like this

def stuff(x=[]):
     x.append("a")
     return(x)

stuff()
stuff()
stuff()

image

Question for Reviewers

@StrikerRUS what do you think about running pylint in CI to catch things like this? I have found that it catches things like this that pycodestyle doesn't catch, and which can affect the correctness of the code. It's also a very lightweight dependency that runs quickly, so it shouldn't make the lint task on Travis too much slower or fragile.

If you're open to this, I'll open a separate PR to add pylint to the lint task on Travis. We could also be very aggressive about ignoring style items to focus only on things that impact the correctness of the code, like this: https://github.com/jameslamb/doppel-cli/blob/84eb4c1973d96f7bbc742ee0b5231e314740ce1d/.ci/lint-py.sh#L54-L56

@StrikerRUS
Copy link
Collaborator

I have found that it catches things like this that pycodestyle doesn't catch, and which can affect the correctness of the code.

Yeah, I've noticed that too. pylint is quite stricter than pycodestyle... I believe we will have to ignore a lot of style warnings or have to rewrite Python wrapper completely! 😄 But I'm OK to have a test PR with pylint and see what we will get.

@jameslamb
Copy link
Collaborator Author

I believe we will have to ignore a lot of style warnings or have to rewrite Python wrapper completely

yeah I agree! Honestly even if we ignore 85% of the warnings that are style-only, it can still be worth it for some of these things that affect the correctness of the code. I'll try a PR this weekend, let's see how bad it looks

@jameslamb
Copy link
Collaborator Author

but to be clear it will be separate from this PR

feature_name='auto', categorical_feature='auto',
callbacks=None, init_model=None):
"""Docstring is inherited from the LGBMModel."""
# check group data
eval_at = eval_at or [1, 2, 3, 4, 5]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-learn doesn't like when we modify fit arguments: #3254 (comment).

I believe changing default value from list to tuple ([1, 2, 3, 4, 5] -> (1, 2, 3, 4, 5)) will be enough and will not break anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, the behavior is identical. Just the way we achieve it is different. I followed that comment and then the PR it linked to and I'm not sure that the change I'm proposing would actually break anything.

I'm nervous that changing data types will break something, because now you have the default as a tuple and users probably passing in values as a list

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous that changing data types will break something, because now you have the default as a tuple and users probably passing in values as a list

That pylint warning you are addressing in this PR is about dangerous default values. There should be nothing wrong with that users pass lists intentionally.

I don't really like you approach because with it default value is implicit: documentation says it is 1,2,3,4,5 but actually it is None. It may be misleading.
Also, scikit-learn prefers everything to be explicit. It is not allowed to change init args or even have kwargs, for example. So I'm pretty sure it is better to not have implicit argument overwriting for avoiding any possible (future) issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. To be honest I still don't understand the scikit-learn thing but I believe you.

There should be nothing wrong with that users pass lists intentionally.

Using a tuple as the argument means that if we have any code anywhere down the stack that is expecting specifically a list and not a tuple (for example, something that does eval_at.append()), it will break. But let me try and see what the tests say.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I forgot about this one! Just updated to a tuple. Hopefully the tests will still pass. I searched around with git grep eval_at and didn't find any list-specific code so I think it should work.

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@StrikerRUS StrikerRUS merged commit ecbb0e9 into microsoft:master Sep 29, 2020
@jameslamb jameslamb deleted the fix/dangerous-default branch October 11, 2020 04:36
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants