Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport unicode litterals from Python 3 #702

Merged
merged 12 commits into from
Aug 3, 2018
Merged

Backport unicode litterals from Python 3 #702

merged 12 commits into from
Aug 3, 2018

Conversation

fpagnoux
Copy link
Member

@fpagnoux fpagnoux commented Aug 1, 2018

Minor Change without impacts for reusers:

  • Make code more Python3-like by backporting unicode litterals.
  • With this backport, all strings are by default unicodes.
  • The u prefix for strings should not be used anymore.
  • Each new module must start by from __future__ import unicode_literals for the backport to be effective.

Rationale

We've paid the price for compatibility with Python 3, let's start to enjoy some of its benefits 🙂.

The general idea is to:

  • Opportunistically apply Python-3 standards to our codebase
  • Rely on __future__ (Python builtins backports) and future (extra library) to keep compatibility with Python 2
  • When we drop compatibility with Python 2, we'll mostly just have imports to remove

See this talk: https://www.youtube.com/watch?v=klaGx9Q_SOA

@MattiSG
Copy link
Member

MattiSG commented Aug 1, 2018

OMG so much yes!!! Thanks!! ❤️ This has been so much pain to explain to newcomers, and such an annoyance to read.

@fpagnoux
Copy link
Member Author

fpagnoux commented Aug 1, 2018

This PR doesn't contain much logic, and should be relatively quick to review, but has a huge potential for conflicts, so a quick review would be really appreciated 🙂.

@maukoquiroga @sandcha @Anna-Livia

(Or @MattiSG if you feel like diving into some really passionating Python 😉)

Copy link

@magopian magopian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work ;)

FYI, black will automatically remove the u prefixes when the unicode_literals are used. It might make sense to adopt it and use it to automatically reformat the code? I gave it a go in openfisca/openfisca-france#1060 if you need an example of what it does.

I saw a couple of changes that seem unrelated to the backport, not sure if they should be done in another PR?

Also, I'm pretty rusty on unicode concerns, and even more in py2, so my comments may be way off ;)

CHANGELOG.md Outdated

### 23.3.2 [#702](https://github.com/openfisca/openfisca-core/pull/702)

Minor Change without impacts for reusers:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? Should it be users?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant re-users, but users is probably better.

CHANGELOG.md Outdated
Minor Change without impacts for reusers:
- Make code more Python3-like by backporting unicode litterals.
- With this backport, all strings are by default unicodes.
- The `u` prefix for strings should *not* be used anymore.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a check run in the CI (or a git commit hook) that makes sure no new \bu['"] are introduced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can give me a shell command that does it, I can add it to Circle 🙂.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know black does it automatically... but the change is a bit more involved... as you can see in openfisca/openfisca-france#1060 (and specifically the commit openfisca/openfisca-france@9dd7580 adding it, and the consequences: openfisca/openfisca-france@6d8621b)

Copy link
Member Author

@fpagnoux fpagnoux Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black seems very opinionated, so using it would be something to think about more thoroughly.

Or we can hope that shell wizzard @MattiSG drops by and finds a solution to this issue and the other one by the wave of a magic wand 😄

CHANGELOG.md Outdated
- Make code more Python3-like by backporting unicode litterals.
- With this backport, all strings are by default unicodes.
- The `u` prefix for strings should *not* be used anymore.
- Each new module must start by `from __future__ import unicode_literals` for the backport to be effective.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe a script that makes sure no new file is created without this import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd happily take a shell command that makes sure that each .py file contains from __future__ import unicode_literals, print_function, division, absolute_import.

I guess I could do it in Python, but I'm sure there is a quick way in shell...

README.md Outdated
This package requires Python 2.7
OpenFisca runs on Python 3.6, or more recent versions.

Backward compatibility with Python 2.7 is for now guaranteed, but will be dropped from January 1st, 2019.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would read better as Backward compatibility with Python 2.7 is maintained for now,...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to give a date to openfisca project mantainers so they can get organized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -32,6 +34,7 @@ def make_column_from_variable(variable):
int: IntCol,
float: FloatCol,
str: StrCol,
bytes: StrCol,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that related to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes!

In Python 2, by importing str from builtins, in this module str is now (roughly) Python 3's str.

But if in a country package I'm using the regular Python 2 str (which is the same than bytes) as a value_type, then I'll have a KeyError in CONVERSION_MAP[variable.value_type].

So we need to add this to stay compatible with Python 2.

@@ -13,9 +14,9 @@ def build_tax_benefit_system(country_package_name):
country_package = importlib.import_module(country_package_name)
except ImportError:
message = linesep.join([traceback.format_exc(),
u'Could not import module `{}`.'.format(country_package_name).encode('utf-8'),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, weird, how was it working previously? It was joining one line of bytes with lines of str? How was that possible? I must be missing something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess join is flexible enough to handle both bytes and str:

' '.join([u'X', b'Y']) works in Python 2

Copy link

@magopian magopian Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in the code it's the opposite of your example, and also you need to have a mix of chars that won't auto decode properly, like:

u"\n".join(["é", u"é"])

This will fail, but I guess this specific use case never occured in real life. Wow, lucky, and yay for py3 and proper explicit unicode handling!

@@ -38,9 +39,9 @@ def build_source_url(country_package_metadata, source_file_path, start_line_numb

def build_formula(formula, country_package_metadata, source_file_path, tax_benefit_system):
source_code, start_line_number = inspect.getsourcelines(formula)
if source_code[0].lstrip(' ').startswith('@'): # remove decorator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you remove the decorator removal on purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't been using these decorators for a long time (FI for this kind of off-topic cleanups I try to explicit the intention in commit titles)

Agreed, it's not related to the PR.

@@ -159,7 +160,7 @@ def test_calculate_variable_with_wrong_definition_period():
expected_words = ['period', '2016', 'month', 'basic_income', 'ADD']

for word in expected_words:
assert word in error_message, u'Expected "{}" in error message "{}"'.format(word, error_message).encode('utf-8')
assert word in error_message, 'Expected "{}" in error message "{}"'.format(word, error_message).encode('utf-8')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the .encode('utf-8') for here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't adding .encode('utf-8') everywhere the answer to all encodings issues in Python ? :trollface:

More seriously, removing it 👍 (but I'm sure there are many others)

if not os.path.isdir(yaml_tests_dir):
ci_yaml_tests_dir = os.path.join(os.path.expanduser('~'), 'openfisca-core', 'tests', 'core', 'yaml_tests')
yaml_tests_dir = ci_yaml_tests_dir if os.path.isdir(ci_yaml_tests_dir) else yaml_tests_dir

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to be related to the issue, am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, another off-topic quick cleanup, sorry 😅

@@ -13,7 +14,7 @@ def assert_items_equal(x, y):
# /variables

variables_response = subject.get('/variables')
GITHUB_URL_REGEX = '^https://github\.com/openfisca/openfisca-country-template/blob/\d+\.\d+\.\d+((.dev|rc)\d+)?/openfisca_country_template/variables/(.)+\.py#L\d+-L\d+$'
GITHUB_URL_REGEX = '^https://github\.com/openfisca/country-template/blob/\d+\.\d+\.\d+((.dev|rc)\d+)?/openfisca_country_template/variables/(.)+\.py#L\d+-L\d+$'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to be related to the issue at hand either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but tests won't pass otherwise, as the country template we use for tests has evolved recently.

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-


from __future__ import unicode_literals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put both __future__ imports on the same line ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -6,7 +6,9 @@ This package contains the core features of OpenFisca, which are meant to be used

## Environment

This package requires Python 2.7
OpenFisca runs on Python 3.6, or more recent versions.
Copy link
Contributor

@Anna-Livia Anna-Livia Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use 3.6, and drop 2.7 we can now use f strings ^^ https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498

f"Missing value for variable {holder.variable.name} at {period}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know! But they are not backportable to 2.7 😞 , unless we use some kind of transpilation, and I'm not sure I want to go there...

README.md Outdated
This package requires Python 2.7
OpenFisca runs on Python 3.6, or more recent versions.

Backward compatibility with Python 2.7 is for now guaranteed, but will be dropped from January 1st, 2019.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to give a date to openfisca project mantainers so they can get organized.

@@ -10,6 +10,8 @@

The aim of this script is to compare the time taken by the calculation of the values
"""
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What features is print_functionbringing in this file ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we write :

from __future__  import unicode_literals, print_function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, futurize added that for me 😄 (I know, bad answer).

I don't think it really brings features per se. It just provides the module the Python 3 version of print. The only concrete advantages I can see:

  • if a contributor on Python 2 uses print the old way, they will get the error early, instead of waiting for the CI to crash.
  • Running futurize again will not suggest the same change again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think to avoid confusion and to avoid inconsistency, let me just import all the future in every module, on one line.

@fpagnoux fpagnoux mentioned this pull request Aug 2, 2018
8 tasks
@fpagnoux fpagnoux requested a review from Anna-Livia August 2, 2018 17:38
@fpagnoux
Copy link
Member Author

fpagnoux commented Aug 2, 2018

FYI, black will automatically remove the u prefixes when the unicode_literals are used. It might make sense to adopt it and use it to automatically reformat the code? I gave it a go in openfisca/openfisca-france#1060 if you need an example of what it does.

I wish I had known about this before half-manually removing all the u 😞. Thanks for the tips!

I saw a couple of changes that seem unrelated to the backport, not sure if they should be done in another PR?

Ideally, yes... I'd say that the size and complexity of the riders fall in the usual margin of tolerance, but I'm obviously biased by my self-interest.

@magopian
Copy link

magopian commented Aug 3, 2018

Very good work @fpagnoux on following up with the review 👍
IMHO this is ready to land.

There are two concerns left to discuss in other issues:

@fpagnoux fpagnoux merged commit a044a30 into master Aug 3, 2018
@fpagnoux fpagnoux deleted the unicode-py3 branch August 3, 2018 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants