Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend NumPy compatibility to v1.20 #1010

Merged
merged 6 commits into from
Apr 30, 2021
Merged

Extend NumPy compatibility to v1.20 #1010

merged 6 commits into from
Apr 30, 2021

Conversation

benjello
Copy link
Member

@benjello benjello commented Apr 23, 2021

Fixes #1009


Edit @maukoquiroga on 27/04/2021

Bug fix

  • Make NumPy's type-checking compatible with 1.17.0+
    • NumPy introduced their typing module since 1.20.0
    • Previous type hints relying on annotations will henceforward no longer work
    • This changes ensure that type hints are always legal for the last four minor NumPy versions
TODO
  • Fix type-checking for NumPy 1.18+
  • Bump NumPy's version lower-bound to 1.17.0
  • Run MyPy against n - 1-4 NumPy versions
  • Update styleguide
  • Update readme
  • Update pull request description
  • Bump version & changelog

@benjello
Copy link
Member Author

@MattiSG @maukoquiroga @sandcha : c'est un peu bloquant pour diverses PR en aval. Si jamais l'in d'ente vous avait le temps de regarde ce serait super.

@sandcha
Copy link
Collaborator

sandcha commented Apr 26, 2021

Ok, sorry for the delay. I'm starting a review @benjello!

@sandcha
Copy link
Collaborator

sandcha commented Apr 26, 2021

Thanks to this PR, the tests on Openfisca-survey-manager now pass (but the Run tests step ends with another error that seems outside the scope of this PR: HTTPError on coveralls.io).

@benjello
Copy link
Member Author

@sandcha : the HTTPError is far beyond my skill. Should I proceed and merge this PR ?
You seem to wait for something more since you didn't approve it.

@sandcha
Copy link
Collaborator

sandcha commented Apr 26, 2021

@benjello It looks like we have something else to fix to be compatible with Numpy 1.18 (or 1.18.1) as I'm getting this error on typing with make test:

$ make test
rm -rf build dist
find . -name '*.pyc' -exec rm \{\} \;
python -m compileall -q .
flake8 `git ls-files | grep "\.py$"`
mypy openfisca_core && mypy openfisca_web_api
openfisca_core/indexed_enums/enum_array.py:67: error: Variable "openfisca_core.indexed_enums.enum_array.IndexedEnumArray" is not valid as a type
openfisca_core/indexed_enums/enum_array.py:67: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
openfisca_core/indexed_enums/enum.py:37: error: Variable "openfisca_core.indexed_enums.enum.IndexedEnumArray" is not valid as a type
openfisca_core/indexed_enums/enum.py:37: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
Found 2 errors in 2 files (checked 119 source files)
make: *** [check-types] Error 1

Here is what I did:
On this branch and in a new virtual environment I installed openfisca-core with the following command:
make install that installed Numpy 1.20.2
Then make check-types (as well as make test) was successful.

In order to try it with an older version of Numpy, the 1.20.2 was uninstalled:
pip uninstall numpy
And Numpy was reinstalled with the previous reference revision 1.18 < 1.19
pip install numpy==1.18.1 (here 1.18.1 as needed by the CASD)
Then, I ran the make test as described with the error above.

@sandcha
Copy link
Collaborator

sandcha commented Apr 26, 2021

A quick fix (not really elegant fix) for:

openfisca_core/indexed_enums/enum_array.py:67: error: Variable "openfisca_core.indexed_enums.enum_array.IndexedEnumArray" is not valid as a type
openfisca_core/indexed_enums/enum_array.py:67: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases

would be to add the else statement (line 11):

if typing.TYPE_CHECKING:
    from openfisca_core.indexed_enums import Enum

    IndexedEnumArray = numpy.object_
else:
    IndexedEnumArray = numpy.ndarray[Enum]  # + import to check

to use the previous typing (before #990) on decode.

@sandcha
Copy link
Collaborator

sandcha commented Apr 26, 2021

The same fix should work for the second error message (enum.py:37) as the previous type was identical: IndexedEnumArray was numpy.ndarray[Enum].

Nevertheless, those two errors are detected by make test: it could be interesting to check the test coverage on #990 impacted files to see if we missed something.

@sandcha
Copy link
Collaborator

sandcha commented Apr 26, 2021

The fixes mentioned above cause a new kind of error: the first and second fixes resolve the two errors but when we try to solve the last one which is openfisca_core/indexed_enums/enum_array.py:26:58: F821 undefined name 'Enum', the two resolved errors come back 🙃

While removing the if statement @benjello in both enum.py and enum_array.py and replacing IndexedEnumArray occurrences with numpy.object_ solves the errors even if we seem to lose some precision in the typing (numpy.ndarray[Enum] gives more information than numpy.object_).

Copy link
Collaborator

@sandcha sandcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! To be more explicit, as described in the comments, I would approve it if make test was successful for numpy < 1.20 (aka 1.18 and 1.19). You can dismiss this review when it is fixed :)

@benjello
Copy link
Member Author

Let's wait for @maukoquiroga input. If he does not have a fix without lowering type testing quality I will adopt your suggestion.

@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 26, 2021

Thanks @benjello and @sandcha.

Hard one to crack really 😬 so here are my thoughts, let me know what do you think:

This pull request will effectively fix any runtime issue but will still fail at type-cheking, hence make test.

Assuming those are two different problems, I'd say this pull request is complete and acceptable if we aim exclusively the former, solving the problem for users, and not the latter, which is actually a problem for contributors.

Note: I assume users type-checking their own code won't have any problem as long as they are using a compiled version of the core. If that's not the case, I assume we're dealing with a core's contributor.

For the latter, I see two different solutions:

  1. Keep dependencies loose for users (pip install openfisca-core) and tight for contributors (pip install openfisca-core[dev], pip install -e .[dev], and so on).

I prefer this solution as generally we want loose dependencies for users and tight dependencies for contributors, in order to avoid as much version-lockings for users as version-incompatibilities for contributors.

There's a good overview of this issue here.

  1. Cover the case of contributors using different, incompatible, NumPy versions.

If we assume contributors will also need to be able to use these incompatible versions of NumPy, and that we can't know that beforehand —which version will they be using, then 1. won't work.

In that case, I see no other option than what @sandcha proposes, but that opens some legit questions:

a. Should we provide that compatibility as an exception (so basically doing 1. and 2. at the same time)?

b. What should we test against in CI? The most plausible scenario (all the latest acceptable versions of dependencies)? Both NumPy versions' scenarios? Something else?

  1. Drop the usage of numpy.typing numpy.object_ altogether.

In this case, we'll face other problems:

a. A regression, which is easily fixable by assuming the least modern version of NumPy is being used (what we do for Python versions already).

b. Telling the CI to always assume the least modern version of everything (we would have to script it).

c. Telling contributors to always assume the least modern version of everything (fixable per 1.).


I'd say we go for 1. + this pull request.

Thoughts?

@benjello
Copy link
Member Author

@maukoquiroga @sandcha , let's look at the constraint linked to the numpy version.
I can fairly assume that the most stringent one comes from IPP which is required to use numpy 1.18.1 fro some time at CASD.
Other user have access to the internet so using the last version of numpy is not a problem.
We also want to avoid distinguishing user and contributors t reduce complexity.
So I suggest to adopt the "ugly fix" suggested by @sandcha temporarily, set numpy version dep to numpy >= 1.18.1
until IPP lobbying allow it to use numpy 1.20.

What do you think ?

@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 26, 2021

@benjello Doing so will solve one issue, but not the one we want to address here which is about a feature of NumPy available only since 1.20.x (hence we won't be able to dismiss @sandcha's review).

I do personally think that is OK to let it loose for people installing the library and tighter for people who want to edit it, and who will care about type-checking the code.

Otherwise, I'd we more in favour of dropping the use of nunpy.typing numpy.object_, as in the current state, dependency versions do not represent reality: we actually do not support NumPy versions < 1.20.x as long as we use that feature for people who want to type-check the core.

@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 26, 2021

To be precise, by the look of it, seems that:

  1. numpy.ndarray[Enum] won't work for NumPy >= 1.20
  2. numpy.object_ for NumPy < 1.20

@benjello
Copy link
Member Author

0k I understand better your point @maukoquiroga so we would go with solution 1.
But we should definitively change mininal numpy version to 1.18.1 and push for 1.20 on CASD.

@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 26, 2021

Well, actually, it does not seem possible to type-check against different versions of NumPy at the same time, so I guess my proposed solution now would be to:

  1. Set min to 1.18.1 n - 4 (1.17.0)
  2. Check-types in CI against 1.17, 1.18, 1.19 & 1.20 (already done by make test).

I still think 1. proposed before to be the right way to go, but maybe it goes a bit beyond what's required for this particular case (requirement of @sandcha that type-checking works for 1.18+).

@bonjourmauko
Copy link
Member

Got something working here, what do you think @benjello & @sandcha?

@bonjourmauko
Copy link
Member

While removing the if statement @benjello in both enum.py and enum_array.py and replacing IndexedEnumArray occurrences with numpy.object_ solves the errors even if we seem to lose some precision in the typing (numpy.ndarray[Enum] gives more information than numpy.object_).

This is what I propose to do as well, thanks for that investigative work, here's why: as starting from 1.20.x NumPy provides its own type hints —that is basically why we cannot continue to use numpy.ndarray[Enum].

@benjello
Copy link
Member Author

I am ok with your proposal @maukoquiroga.
Do you @sandcha ?
Should we cose this PR and switch to the one you propose @maukoquiroga ?

@bonjourmauko bonjourmauko added the kind:fix Bugs are defects and failure demand. label Apr 27, 2021
@bonjourmauko bonjourmauko changed the title import numpy.typing if and only if typing.TYPE_CHECKING [WIP] Make NumPy's type-checking compatible with 1.17.0+ Apr 27, 2021
@bonjourmauko bonjourmauko requested a review from sandcha April 28, 2021 18:24
@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 28, 2021

Thanks @sandcha ! Regarding minor/major, I reproduce part of my comment on #990:

  1. When we consider supporting a new NumPy version, or in general a new version of any "general" dependency (for everyone, not just a "dev" one, like flake8), we can rightly say, as it was done here (cf Upgrade numpy to add support for M1 processors #990), that it is a minor change, given we ensure compatibility of new changes to the core with not just the new added version but at least the older one supported (in this case 1.11.0 and 1.20.0).

@MattiSG said justly (cf #990):

I would hope that dependents can have their own version of Numpy without relying on ours!

They do rely on ours, but on all the range of them, not just the latest. Upgrading NumPy does not force anyone to use that version, unless we force them to do it —which we could— like in #1012.

However, the fact that we're not currently testing that they can actually continue to use the version they're already using, makes it impossible to ourselves to know whether we're introducing a breaking-change or not.

In that specific case, I think we should not defer the responsibility of knowing so to the users, knowing how to avoid it —we're literally saying "we mark it as a breaking change because, as we can know if it is, but we haven't tested so, you will have to do it by yourself". Said otherwise, impact of an upgrade is completely foreseeable, because users can always opt-out.

Of course, we can always assume that:

  • by default all users will be willing to upgrade their dependencies when we upgrade ours (contrary to what happened here)
  • we use versioning more as a dissuasive tool than an informative one (let's just not jump on that NumPy upgrade so easily before we can test backwards compatibility)

In any of those cases it would indeed make sense to mark any NumPy bump as a major.

I'm strongly in favour of a minor here, after all we did actually test changes are compatible with NumPy 1.17.0+

(On a side note, I've tested manually these changes against 1.11.0-1.20.0 and only 1.17.0+ passed, but the situation precedes these changes, so we introduced a breaking-change at some point silently).

@bonjourmauko bonjourmauko removed their assignment Apr 29, 2021
Copy link
Member

@bonjourmauko bonjourmauko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello all! Thank you all for your great efforts and reactivity to this issue.

I'm now requesting changes as I think that what we should do in this situation, generally, would be to:

Note: I'm not blocking because of the minor/major, as there is no visible consensus yet.

I'd also strongly advice we document the way we decide how to handle the situation:

  • Semantic versioning regarding changes in our requirements
  • Dealing with problematic versions
    • Wrong versioning breaking the principle of least astonishment
    • Critical bugs introduced breaking usability of the library by users

If you think this request for changes can be dismissed, please do not hesitate to do so, clearly specifying the reasons, so we can include them in the documentation proposed.

@sandcha
Copy link
Collaborator

sandcha commented Apr 29, 2021

Rebasing on master branch to get #1014 content.

Done by a revert on 08bcaa5c revert: revert of Drop latest NumPy supported version to 1.18.x
@sandcha sandcha requested a review from bonjourmauko April 29, 2021 18:18
@bonjourmauko bonjourmauko changed the title Make NumPy's type-checking compatible with 1.17.0+ Extend NumPy compatibility to v1.20 Apr 29, 2021
@bonjourmauko bonjourmauko requested review from bonjourmauko and guillett and removed request for bonjourmauko and guillett April 29, 2021 21:03
Mauko Quiroga and others added 5 commits April 29, 2021 23:07
setup.py Show resolved Hide resolved
@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 29, 2021

Thanks again @sandcha ! 🙏 🙏 🙏

I've just done a fixup of commits and profited to correct versioning (minor => 35.4.0) 😃

Otherwise, I'm OK with all changes. As all commits are of my authorship, it would be great to have your last review & LGTM.

Copy link
Collaborator

@sandcha sandcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for this PR 🙌
Tested with openfisca-france as an example of a large code base and no error was detected by make test with numpy 1.20, 1.19 and 1.18.1 (CASD revision).

@sandcha sandcha dismissed bonjourmauko’s stale review April 30, 2021 18:01

All seems fine. Dismissing according to this comment #1010 (comment)

@sandcha sandcha merged commit 443bb44 into master Apr 30, 2021
@sandcha sandcha deleted the numpy-typing branch April 30, 2021 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:fix Bugs are defects and failure demand.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenFisca-Core seems to depend on numpy 1.20
3 participants