-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport unicode litterals from Python 3 #702
Conversation
OMG so much yes!!! Thanks!! ❤️ This has been so much pain to explain to newcomers, and such an annoyance to read. |
This PR doesn't contain much logic, and should be relatively quick to review, but has a huge potential for conflicts, so a quick review would be really appreciated 🙂. @maukoquiroga @sandcha @Anna-Livia (Or @MattiSG if you feel like diving into some really passionating Python 😉) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work ;)
FYI, black will automatically remove the u
prefixes when the unicode_literals
are used. It might make sense to adopt it and use it to automatically reformat the code? I gave it a go in openfisca/openfisca-france#1060 if you need an example of what it does.
I saw a couple of changes that seem unrelated to the backport, not sure if they should be done in another PR?
Also, I'm pretty rusty on unicode concerns, and even more in py2, so my comments may be way off ;)
CHANGELOG.md
Outdated
|
||
### 23.3.2 [#702](https://github.com/openfisca/openfisca-core/pull/702) | ||
|
||
Minor Change without impacts for reusers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo? Should it be users
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant re-users, but users is probably better.
CHANGELOG.md
Outdated
Minor Change without impacts for reusers: | ||
- Make code more Python3-like by backporting unicode litterals. | ||
- With this backport, all strings are by default unicodes. | ||
- The `u` prefix for strings should *not* be used anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a check run in the CI (or a git commit hook) that makes sure no new \bu['"]
are introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can give me a shell command that does it, I can add it to Circle 🙂.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know black does it automatically... but the change is a bit more involved... as you can see in openfisca/openfisca-france#1060 (and specifically the commit openfisca/openfisca-france@9dd7580 adding it, and the consequences: openfisca/openfisca-france@6d8621b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHANGELOG.md
Outdated
- Make code more Python3-like by backporting unicode litterals. | ||
- With this backport, all strings are by default unicodes. | ||
- The `u` prefix for strings should *not* be used anymore. | ||
- Each new module must start by `from __future__ import unicode_literals` for the backport to be effective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, maybe a script that makes sure no new file is created without this import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I'd happily take a shell command that makes sure that each .py
file contains from __future__ import unicode_literals, print_function, division, absolute_import
.
I guess I could do it in Python, but I'm sure there is a quick way in shell...
README.md
Outdated
This package requires Python 2.7 | ||
OpenFisca runs on Python 3.6, or more recent versions. | ||
|
||
Backward compatibility with Python 2.7 is for now guaranteed, but will be dropped from January 1st, 2019. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this would read better as Backward compatibility with Python 2.7 is maintained for now,...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is important to give a date to openfisca project mantainers so they can get organized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -32,6 +34,7 @@ def make_column_from_variable(variable): | |||
int: IntCol, | |||
float: FloatCol, | |||
str: StrCol, | |||
bytes: StrCol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yes!
In Python 2
, by importing str
from builtins
, in this module str
is now (roughly) Python 3's str
.
But if in a country package I'm using the regular Python 2 str
(which is the same than bytes
) as a value_type
, then I'll have a KeyError
in CONVERSION_MAP[variable.value_type]
.
So we need to add this to stay compatible with Python 2.
@@ -13,9 +14,9 @@ def build_tax_benefit_system(country_package_name): | |||
country_package = importlib.import_module(country_package_name) | |||
except ImportError: | |||
message = linesep.join([traceback.format_exc(), | |||
u'Could not import module `{}`.'.format(country_package_name).encode('utf-8'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, weird, how was it working previously? It was joining one line of bytes with lines of str? How was that possible? I must be missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess join
is flexible enough to handle both bytes
and str
:
' '.join([u'X', b'Y'])
works in Python 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in the code it's the opposite of your example, and also you need to have a mix of chars that won't auto decode properly, like:
u"\n".join(["é", u"é"])
This will fail, but I guess this specific use case never occured in real life. Wow, lucky, and yay for py3 and proper explicit unicode handling!
@@ -38,9 +39,9 @@ def build_source_url(country_package_metadata, source_file_path, start_line_numb | |||
|
|||
def build_formula(formula, country_package_metadata, source_file_path, tax_benefit_system): | |||
source_code, start_line_number = inspect.getsourcelines(formula) | |||
if source_code[0].lstrip(' ').startswith('@'): # remove decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you remove the decorator removal on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't been using these decorators for a long time (FI for this kind of off-topic cleanups I try to explicit the intention in commit titles)
Agreed, it's not related to the PR.
tests/core/test_countries.py
Outdated
@@ -159,7 +160,7 @@ def test_calculate_variable_with_wrong_definition_period(): | |||
expected_words = ['period', '2016', 'month', 'basic_income', 'ADD'] | |||
|
|||
for word in expected_words: | |||
assert word in error_message, u'Expected "{}" in error message "{}"'.format(word, error_message).encode('utf-8') | |||
assert word in error_message, 'Expected "{}" in error message "{}"'.format(word, error_message).encode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the .encode('utf-8')
for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't adding
.encode('utf-8')
everywhere the answer to all encodings issues in Python ?
More seriously, removing it 👍 (but I'm sure there are many others)
if not os.path.isdir(yaml_tests_dir): | ||
ci_yaml_tests_dir = os.path.join(os.path.expanduser('~'), 'openfisca-core', 'tests', 'core', 'yaml_tests') | ||
yaml_tests_dir = ci_yaml_tests_dir if os.path.isdir(ci_yaml_tests_dir) else yaml_tests_dir | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to be related to the issue, am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, another off-topic quick cleanup, sorry 😅
@@ -13,7 +14,7 @@ def assert_items_equal(x, y): | |||
# /variables | |||
|
|||
variables_response = subject.get('/variables') | |||
GITHUB_URL_REGEX = '^https://github\.com/openfisca/openfisca-country-template/blob/\d+\.\d+\.\d+((.dev|rc)\d+)?/openfisca_country_template/variables/(.)+\.py#L\d+-L\d+$' | |||
GITHUB_URL_REGEX = '^https://github\.com/openfisca/country-template/blob/\d+\.\d+\.\d+((.dev|rc)\d+)?/openfisca_country_template/variables/(.)+\.py#L\d+-L\d+$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to be related to the issue at hand either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but tests won't pass otherwise, as the country template we use for tests has evolved recently.
openfisca_core/holders.py
Outdated
@@ -1,6 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
|
|||
from __future__ import unicode_literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put both
__future__
imports on the same line ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -6,7 +6,9 @@ This package contains the core features of OpenFisca, which are meant to be used | |||
|
|||
## Environment | |||
|
|||
This package requires Python 2.7 | |||
OpenFisca runs on Python 3.6, or more recent versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use
3.6
, and drop 2.7 we can now usef strings
^^ https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498
f"Missing value for variable {holder.variable.name} at {period}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know! But they are not backportable to 2.7 😞 , unless we use some kind of transpilation, and I'm not sure I want to go there...
README.md
Outdated
This package requires Python 2.7 | ||
OpenFisca runs on Python 3.6, or more recent versions. | ||
|
||
Backward compatibility with Python 2.7 is for now guaranteed, but will be dropped from January 1st, 2019. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is important to give a date to openfisca project mantainers so they can get organized.
@@ -10,6 +10,8 @@ | |||
|
|||
The aim of this script is to compare the time taken by the calculation of the values | |||
""" | |||
from __future__ import print_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What features is print_function
bringing in this file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we write :
from __future__ import unicode_literals, print_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, futurize
added that for me 😄 (I know, bad answer).
I don't think it really brings features per se. It just provides the module the Python 3 version of print. The only concrete advantages I can see:
- if a contributor on Python 2 uses
print
the old way, they will get the error early, instead of waiting for the CI to crash. - Running futurize again will not suggest the same change again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think to avoid confusion and to avoid inconsistency, let me just import all the future in every module, on one line.
I wish I had known about this before half-manually removing all the
Ideally, yes... I'd say that the size and complexity of the riders fall in the usual margin of tolerance, but I'm obviously biased by my self-interest. |
Very good work @fpagnoux on following up with the review 👍 There are two concerns left to discuss in other issues:
|
Minor Change without impacts for reusers:
u
prefix for strings should not be used anymore.from __future__ import unicode_literals
for the backport to be effective.Rationale
We've paid the price for compatibility with Python 3, let's start to enjoy some of its benefits 🙂.
The general idea is to:
__future__
(Python builtins backports) andfuture
(extra library) to keep compatibility with Python 2See this talk: https://www.youtube.com/watch?v=klaGx9Q_SOA