-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX ImportError for _DictWithDeprecationKeys #188
Conversation
In sklearn 1.2.0, _DictWithDeprecatinKeys will be removed. Currently, skops will fail to import it and raise an error. In the future, we try to import it and if that fails, not add it as a dispatch type. Implementation The bugfix should work. There is, however, trouble with testing it, because we need to patch top level imports. This does not work properly because the order of patching vs importing is wrong, resulting in the patch being applied too late. I have added a test to demonstrate it. If it is run in isolation, it passes: $ python -m pytest skops/io/tests/test_deprecation.py If run in conjunction with the other tests, it fails: $ python -m pytest -m skops E assert not True E + where True = any(<generator object test_dictwithdeprecatedkeys_cannot_be_imported.<locals>.<genexpr> at 0x7fa727c2e3b0>) This is because the other tests will trigger the import of _DictWithDeprecatinKeys _before_ the patch is applied. This is bad, ideally we want our tests not to affect each other. I see these solutions: 1. remove the test and just assume that the bugfix works 2. CI matrix extended to run with sklearn<1.2 and with sklearn>=1.2 3. change imports such that the tests don't influence each other I think it goes without saying why 1 and 2 are not very desirable. I haven't actually tested 3. but the idea would be to make all skops imports in the tests local instead importing on the root level. So for example if we want to test skops.io.load, instead of having: from skops.io import load def test_it(): load(...) we should have: @pytest.fixture def load(): from skops.io import load return load def test_it(load): load(...) If I'm not mistaken, this should resolve this issue and most similar issues in the future (it's not a cure all solution though). As a practical example, in skorch we handle it exactly like this. Below is an example of testing the skorch to_tensor function. See how the function is only imported locally, not on a module level: https://github.com/skorch-dev/skorch/blob/cfe568b58f150730e7269b5a2a1b2bac2228dbd2/skorch/tests/test_utils.py#L15-L34 Hopefully, there is a better way than any of the suggestions, but I couldn't come up with any.
RFC @skops-dev/maintainers |
The question is, how do we want to handle different sklearn versions. Handling adhoc might make it too messy. I'm thinking maybe we should have a section for each version and do the imports and dispatch registration accordingly. I think it would also make sense to have a CI for each supported scikit-learn version, only for persistence tests maybe? |
That would be cleaner, I agree. Maybe overkill for just this one issue? But we should keep it in mind.
We currently have 3 supported versions (not counting patch releases), 0.24, 1.0 and 1.1. This list will probably grow. Factoring in 3 OS's and 4 Python versions, that's 36 runs. From the non-network tests, the persistence tests are the most costly, so I guess we might as well run them all (especially given how we still haven't found out to run workflows based on commit message). In the end, that should be doable (if we manage to exclude the inference tests), but probably it's at the limit. Adding another multiplier would be too much. |
We could maybe somewhat bind the sklearn version to the python version, i.e. test each sklearn version only once for each OS |
I'd be fine with the PR more or less as is, we just need to structure it well and document the blocks with sklearn version requirements. |
Could you please elaborate what needs to be done for this?
Okay, I can do that. Regarding the testing issue, the solution would be not to have the test at all and rely on CI with different sklearn versions to discover this type of error, right? |
yes.
Nothing regarding the code, more like:
And make sure such code is not distributed around the files and is more or less in one place kinda thing. |
- Add comments to clarify change and TODO about when it can be removed - Remove test as discussed, will be covered by CI
@adrinjalali I addressed your comments, please review again.
That wasn't really possible, as we want to have the imports at the top of the file but the affected list has to be at the bottom. If you are okay with moving the import for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
# longer needed for GraphicalLassoCV, see #187. | ||
if _DictWithDeprecatedKeys is not None: | ||
GET_STATE_DISPATCH_FUNCTIONS.append( | ||
(_DictWithDeprecatedKeys, _DictWithDeprecatedKeys_get_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add the same note to _DictWithDeprecatedKeys_get_state
and _DictWithDeprecatedKeys_get_instance
Do you think we should add the CI for different sklearn versions here or in a separate PR? I would say it would make sense here, but also happy to have it separate. |
I'd rather do it in another PR, just in case it gets messy again like the inference thing. The change in this PR only has an effect if sklearn 1.2 is released, so it's not super urgent. |
Ok, then let's merge this and do the CI separately. |
Relates to #187
Description
In sklearn 1.2.0,
_DictWithDeprecatinKeys
will be removed. Currently, skops will fail to import it and raise an error. In the future, we try to import it and if that fails, not add it as a dispatch type.Implementation
The bugfix should work. There is, however, trouble with testing it, because we need to patch top level imports. This does not work properly because the order of patching vs importing is wrong, resulting in the patch being applied too late.
I have added a test to demonstrate it. If it is run in isolation, it passes:
$ python -m pytest skops/io/tests/test_deprecation.py
If run in conjunction with the other tests, it fails (see failing CI):
This is because the other tests will trigger the import of
_DictWithDeprecatinKeys
before the patch is applied. This is bad, ideally we want our tests not to affect each other.I see these solutions:
sklearn<1.2
and withsklearn>=1.2
I think it goes without saying why 1 and 2 are not very desirable.
I haven't actually tested 3. but the idea would be to make all skops imports in the tests local instead importing on the root level. So for example if we want to test
skops.io.load
, instead of having:we should have:
If I'm not mistaken, this should resolve this issue and most similar issues in the future (it's not a cure all solution though).
As a practical example, in skorch we handle it exactly like this. Below is an example of testing the skorch
to_tensor
function. See how the function is only imported locally, not on a module level:https://github.com/skorch-dev/skorch/blob/cfe568b58f150730e7269b5a2a1b2bac2228dbd2/skorch/tests/test_utils.py#L15-L34
This is strictly done for all tests, despite adding some verbosity.
Hopefully, there is a better way than any of the suggestions, but I couldn't come up with any.