[python] fix dangerous default for eval_at in LGBMRanker #3377

jameslamb · 2020-09-11T03:47:44Z

I ran pylint over the Python package tonight, and found some tiny warnings that I think are worth addressing.

This PR fixes the following:

python-package/lightgbm/sklearn.py:944:4: W0102: Dangerous default value [] as argument (dangerous-default-value)

Using literals of mutable data structures like [] or {} is dangerous because Python initializes them and then every call gets a reference to the same object.

Like this

def stuff(x=[]):
     x.append("a")
     return(x)

stuff()
stuff()
stuff()

Question for Reviewers

@StrikerRUS what do you think about running pylint in CI to catch things like this? I have found that it catches things like this that pycodestyle doesn't catch, and which can affect the correctness of the code. It's also a very lightweight dependency that runs quickly, so it shouldn't make the lint task on Travis too much slower or fragile.

If you're open to this, I'll open a separate PR to add pylint to the lint task on Travis. We could also be very aggressive about ignoring style items to focus only on things that impact the correctness of the code, like this: https://github.com/jameslamb/doppel-cli/blob/84eb4c1973d96f7bbc742ee0b5231e314740ce1d/.ci/lint-py.sh#L54-L56

StrikerRUS · 2020-09-11T19:56:58Z

I have found that it catches things like this that pycodestyle doesn't catch, and which can affect the correctness of the code.

Yeah, I've noticed that too. pylint is quite stricter than pycodestyle... I believe we will have to ignore a lot of style warnings or have to rewrite Python wrapper completely! 😄 But I'm OK to have a test PR with pylint and see what we will get.

jameslamb · 2020-09-11T19:59:24Z

I believe we will have to ignore a lot of style warnings or have to rewrite Python wrapper completely

yeah I agree! Honestly even if we ignore 85% of the warnings that are style-only, it can still be worth it for some of these things that affect the correctness of the code. I'll try a PR this weekend, let's see how bad it looks

jameslamb · 2020-09-11T19:59:36Z

but to be clear it will be separate from this PR

StrikerRUS · 2020-09-11T20:02:39Z

python-package/lightgbm/sklearn.py

            feature_name='auto', categorical_feature='auto',
            callbacks=None, init_model=None):
        """Docstring is inherited from the LGBMModel."""
        # check group data
+        eval_at = eval_at or [1, 2, 3, 4, 5]


scikit-learn doesn't like when we modify fit arguments: #3254 (comment).

I believe changing default value from list to tuple ([1, 2, 3, 4, 5] -> (1, 2, 3, 4, 5)) will be enough and will not break anything.

in this case, the behavior is identical. Just the way we achieve it is different. I followed that comment and then the PR it linked to and I'm not sure that the change I'm proposing would actually break anything.

I'm nervous that changing data types will break something, because now you have the default as a tuple and users probably passing in values as a list

I'm nervous that changing data types will break something, because now you have the default as a tuple and users probably passing in values as a list

That pylint warning you are addressing in this PR is about dangerous default values. There should be nothing wrong with that users pass lists intentionally.

I don't really like you approach because with it default value is implicit: documentation says it is 1,2,3,4,5 but actually it is None. It may be misleading.
Also, scikit-learn prefers everything to be explicit. It is not allowed to change init args or even have kwargs, for example. So I'm pretty sure it is better to not have implicit argument overwriting for avoiding any possible (future) issues.

ok. To be honest I still don't understand the scikit-learn thing but I believe you.

There should be nothing wrong with that users pass lists intentionally.

Using a tuple as the argument means that if we have any code anywhere down the stack that is expecting specifically a list and not a tuple (for example, something that does eval_at.append()), it will break. But let me try and see what the tests say.

sorry I forgot about this one! Just updated to a tuple. Hopefully the tests will still pass. I searched around with git grep eval_at and didn't find any list-specific code so I think it should work.

python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS

LGTM! Thank you!

github-actions · 2023-08-24T04:51:47Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

[python] fix dangerous default for eval_at in LGBMRanker

44112f5

jameslamb added the maintenance label Sep 11, 2020

jameslamb requested a review from StrikerRUS September 11, 2020 03:47

jameslamb requested review from chivee, henry0312 and wxchan as code owners September 11, 2020 03:47

StrikerRUS requested changes Sep 11, 2020

View reviewed changes

jameslamb added 3 commits September 26, 2020 21:53

Merge branch 'master' into fix/dangerous-default

2ec5e48

use a tuple

bcc615b

five

7a8c7d8

jameslamb requested a review from StrikerRUS September 27, 2020 03:22

StrikerRUS reviewed Sep 27, 2020

View reviewed changes

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

Update python-package/lightgbm/sklearn.py

c145f0d

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

jameslamb requested a review from StrikerRUS September 27, 2020 21:46

StrikerRUS approved these changes Sep 29, 2020

View reviewed changes

StrikerRUS merged commit ecbb0e9 into microsoft:master Sep 29, 2020

jameslamb deleted the fix/dangerous-default branch October 11, 2020 04:36

jameslamb mentioned this pull request May 20, 2021

[python] handle arbitrary length feature names in Python-package #4293

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] fix dangerous default for eval_at in LGBMRanker #3377

[python] fix dangerous default for eval_at in LGBMRanker #3377

jameslamb commented Sep 11, 2020

StrikerRUS commented Sep 11, 2020

jameslamb commented Sep 11, 2020

jameslamb commented Sep 11, 2020

StrikerRUS Sep 11, 2020

jameslamb Sep 11, 2020

StrikerRUS Sep 11, 2020

jameslamb Sep 11, 2020

jameslamb Sep 27, 2020

StrikerRUS left a comment

github-actions bot commented Aug 24, 2023

[python] fix dangerous default for eval_at in LGBMRanker #3377

[python] fix dangerous default for eval_at in LGBMRanker #3377

Conversation

jameslamb commented Sep 11, 2020

Question for Reviewers

StrikerRUS commented Sep 11, 2020

jameslamb commented Sep 11, 2020

jameslamb commented Sep 11, 2020

StrikerRUS Sep 11, 2020

Choose a reason for hiding this comment

jameslamb Sep 11, 2020

Choose a reason for hiding this comment

StrikerRUS Sep 11, 2020

Choose a reason for hiding this comment

jameslamb Sep 11, 2020

Choose a reason for hiding this comment

jameslamb Sep 27, 2020

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 24, 2023