Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX ImportError for _DictWithDeprecationKeys #188

Merged
merged 3 commits into from
Oct 17, 2022

Conversation

BenjaminBossan
Copy link
Collaborator

@BenjaminBossan BenjaminBossan commented Oct 14, 2022

Relates to #187

Description

In sklearn 1.2.0, _DictWithDeprecatinKeys will be removed. Currently, skops will fail to import it and raise an error. In the future, we try to import it and if that fails, not add it as a dispatch type.

Implementation

The bugfix should work. There is, however, trouble with testing it, because we need to patch top level imports. This does not work properly because the order of patching vs importing is wrong, resulting in the patch being applied too late.

I have added a test to demonstrate it. If it is run in isolation, it passes:

$ python -m pytest skops/io/tests/test_deprecation.py

If run in conjunction with the other tests, it fails (see failing CI):

$ python -m pytest -m skops
E           assert not True
E            +  where True = any(<generator object test_dictwithdeprecatedkeys_cannot_be_imported.<locals>.<genexpr> at 0x7fa727c2e3b0>)

This is because the other tests will trigger the import of _DictWithDeprecatinKeys before the patch is applied. This is bad, ideally we want our tests not to affect each other.

I see these solutions:

  1. remove the test and just assume that the bugfix works
  2. CI matrix extended to run with sklearn<1.2 and with sklearn>=1.2
  3. change imports such that the tests don't influence each other

I think it goes without saying why 1 and 2 are not very desirable.

I haven't actually tested 3. but the idea would be to make all skops imports in the tests local instead importing on the root level. So for example if we want to test skops.io.load, instead of having:

from skops.io import load

def test_it():
    load(...)

we should have:

@pytest.fixture
def load():
    from skops.io import load
    return load

def test_it(load):
    load(...)

If I'm not mistaken, this should resolve this issue and most similar issues in the future (it's not a cure all solution though).

As a practical example, in skorch we handle it exactly like this. Below is an example of testing the skorch to_tensor function. See how the function is only imported locally, not on a module level:

https://github.com/skorch-dev/skorch/blob/cfe568b58f150730e7269b5a2a1b2bac2228dbd2/skorch/tests/test_utils.py#L15-L34

This is strictly done for all tests, despite adding some verbosity.

Hopefully, there is a better way than any of the suggestions, but I couldn't come up with any.

In sklearn 1.2.0, _DictWithDeprecatinKeys will be removed. Currently,
skops will fail to import it and raise an error. In the future, we try
to import it and if that fails, not add it as a dispatch type.

Implementation

The bugfix should work. There is, however, trouble with testing it,
because we need to patch top level imports. This does not work properly
because the order of patching vs importing is wrong, resulting in the
patch being applied too late.

I have added a test to demonstrate it. If it is run in isolation, it
passes:

$ python -m pytest skops/io/tests/test_deprecation.py

If run in conjunction with the other tests, it fails:

$ python -m pytest -m skops
E           assert not True
E            +  where True = any(<generator object test_dictwithdeprecatedkeys_cannot_be_imported.<locals>.<genexpr> at 0x7fa727c2e3b0>)

This is because the other tests will trigger the import of
_DictWithDeprecatinKeys _before_ the patch is applied. This is bad,
ideally we want our tests not to affect each other.

I see these solutions:

1. remove the test and just assume that the bugfix works
2. CI matrix extended to run with sklearn<1.2 and with sklearn>=1.2
3. change imports such that the tests don't influence each other

I think it goes without saying why 1 and 2 are not very desirable.

I haven't actually tested 3. but the idea would be to make all skops
imports in the tests local instead importing on the root level. So for
example if we want to test skops.io.load, instead of having:

from skops.io import load

def test_it():
    load(...)

we should have:

@pytest.fixture
def load():
    from skops.io import load
    return load

def test_it(load):
    load(...)

If I'm not mistaken, this should resolve this issue and most similar
issues in the future (it's not a cure all solution though).

As a practical example, in skorch we handle it exactly like this. Below
is an example of testing the skorch to_tensor function. See how the
function is only imported locally, not on a module level:

https://github.com/skorch-dev/skorch/blob/cfe568b58f150730e7269b5a2a1b2bac2228dbd2/skorch/tests/test_utils.py#L15-L34

Hopefully, there is a better way than any of the suggestions, but I
couldn't come up with any.
@BenjaminBossan
Copy link
Collaborator Author

RFC @skops-dev/maintainers

@BenjaminBossan BenjaminBossan changed the title Bugfix: ImportError for _DictWithDeprecatinKeys FIX ImportError for _DictWithDeprecationKeys Oct 14, 2022
@BenjaminBossan BenjaminBossan added the bug Something isn't working label Oct 14, 2022
@adrinjalali
Copy link
Member

The question is, how do we want to handle different sklearn versions.

Handling adhoc might make it too messy. I'm thinking maybe we should have a section for each version and do the imports and dispatch registration accordingly.

I think it would also make sense to have a CI for each supported scikit-learn version, only for persistence tests maybe?

@BenjaminBossan
Copy link
Collaborator Author

Handling adhoc might make it too messy. I'm thinking maybe we should have a section for each version and do the imports and dispatch registration accordingly.

That would be cleaner, I agree. Maybe overkill for just this one issue? But we should keep it in mind.

I think it would also make sense to have a CI for each supported scikit-learn version, only for persistence tests maybe?

We currently have 3 supported versions (not counting patch releases), 0.24, 1.0 and 1.1. This list will probably grow. Factoring in 3 OS's and 4 Python versions, that's 36 runs. From the non-network tests, the persistence tests are the most costly, so I guess we might as well run them all (especially given how we still haven't found out to run workflows based on commit message).

In the end, that should be doable (if we manage to exclude the inference tests), but probably it's at the limit. Adding another multiplier would be too much.

@adrinjalali
Copy link
Member

We could maybe somewhat bind the sklearn version to the python version, i.e. test each sklearn version only once for each OS

@adrinjalali
Copy link
Member

I'd be fine with the PR more or less as is, we just need to structure it well and document the blocks with sklearn version requirements.

@BenjaminBossan
Copy link
Collaborator Author

I'd be fine with the PR more or less as is, we just need to structure it well

Could you please elaborate what needs to be done for this?

and document the blocks with sklearn version requirements.

Okay, I can do that.

Regarding the testing issue, the solution would be not to have the test at all and rely on CI with different sklearn versions to discover this type of error, right?

@adrinjalali
Copy link
Member

Regarding the testing issue, the solution would be not to have the test at all and rely on CI with different sklearn versions to discover this type of error, right?

yes.

Could you please elaborate what needs to be done for this?

Nothing regarding the code, more like:

# This part is for version xxx
...
# end of version xxx code

And make sure such code is not distributed around the files and is more or less in one place kinda thing.

- Add comments to clarify change and TODO about when it can be removed
- Remove test as discussed, will be covered by CI
@BenjaminBossan
Copy link
Collaborator Author

@adrinjalali I addressed your comments, please review again.

And make sure such code is not distributed around the files and is more or less in one place kinda thing.

That wasn't really possible, as we want to have the imports at the top of the file but the affected list has to be at the bottom. If you are okay with moving the import for _DictWithDeprecationKeys, I can do that.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

# longer needed for GraphicalLassoCV, see #187.
if _DictWithDeprecatedKeys is not None:
GET_STATE_DISPATCH_FUNCTIONS.append(
(_DictWithDeprecatedKeys, _DictWithDeprecatedKeys_get_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also add the same note to _DictWithDeprecatedKeys_get_state and _DictWithDeprecatedKeys_get_instance

@adrinjalali
Copy link
Member

Do you think we should add the CI for different sklearn versions here or in a separate PR? I would say it would make sense here, but also happy to have it separate.

@BenjaminBossan
Copy link
Collaborator Author

I'd rather do it in another PR, just in case it gets messy again like the inference thing. The change in this PR only has an effect if sklearn 1.2 is released, so it's not super urgent.

@adrinjalali
Copy link
Member

Ok, then let's merge this and do the CI separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants