Backport unicode litterals from Python 3 #702

fpagnoux · 2018-08-01T20:22:18Z

Minor Change without impacts for reusers:

Make code more Python3-like by backporting unicode litterals.
With this backport, all strings are by default unicodes.
The u prefix for strings should not be used anymore.
Each new module must start by from __future__ import unicode_literals for the backport to be effective.

Rationale

We've paid the price for compatibility with Python 3, let's start to enjoy some of its benefits 🙂.

The general idea is to:

Opportunistically apply Python-3 standards to our codebase
Rely on __future__ (Python builtins backports) and future (extra library) to keep compatibility with Python 2
When we drop compatibility with Python 2, we'll mostly just have imports to remove

See this talk: https://www.youtube.com/watch?v=klaGx9Q_SOA

MattiSG · 2018-08-01T20:36:55Z

OMG so much yes!!! Thanks!! ❤️ This has been so much pain to explain to newcomers, and such an annoyance to read.

fpagnoux · 2018-08-01T21:11:22Z

This PR doesn't contain much logic, and should be relatively quick to review, but has a huge potential for conflicts, so a quick review would be really appreciated 🙂.

@maukoquiroga @sandcha @Anna-Livia

(Or @MattiSG if you feel like diving into some really passionating Python 😉)

magopian

Nice work ;)

FYI, black will automatically remove the u prefixes when the unicode_literals are used. It might make sense to adopt it and use it to automatically reformat the code? I gave it a go in openfisca/openfisca-france#1060 if you need an example of what it does.

I saw a couple of changes that seem unrelated to the backport, not sure if they should be done in another PR?

Also, I'm pretty rusty on unicode concerns, and even more in py2, so my comments may be way off ;)

magopian · 2018-08-02T08:21:33Z

CHANGELOG.md

+
+### 23.3.2 [#702](https://github.com/openfisca/openfisca-core/pull/702)
+
+Minor Change without impacts for reusers:


Is this a typo? Should it be users?

I meant re-users, but users is probably better.

magopian · 2018-08-02T08:22:38Z

CHANGELOG.md

+Minor Change without impacts for reusers:
+  - Make code more Python3-like by backporting unicode litterals.
+  - With this backport, all strings are by default unicodes.
+  - The `u` prefix for strings should *not* be used anymore.


Should there be a check run in the CI (or a git commit hook) that makes sure no new \bu['"] are introduced?

If you can give me a shell command that does it, I can add it to Circle 🙂.

I know black does it automatically... but the change is a bit more involved... as you can see in openfisca/openfisca-france#1060 (and specifically the commit openfisca/openfisca-france@9dd7580 adding it, and the consequences: openfisca/openfisca-france@6d8621b)

black seems very opinionated, so using it would be something to think about more thoroughly.

Or we can hope that shell wizzard @MattiSG drops by and finds a solution to this issue and the other one by the wave of a magic wand 😄

magopian · 2018-08-02T08:23:04Z

CHANGELOG.md

+  - Make code more Python3-like by backporting unicode litterals.
+  - With this backport, all strings are by default unicodes.
+  - The `u` prefix for strings should *not* be used anymore.
+  - Each new module must start by `from __future__ import unicode_literals` for the backport to be effective.


Same here, maybe a script that makes sure no new file is created without this import?

Same, I'd happily take a shell command that makes sure that each .py file contains from __future__ import unicode_literals, print_function, division, absolute_import.

I guess I could do it in Python, but I'm sure there is a quick way in shell...

magopian · 2018-08-02T08:24:20Z

README.md

-This package requires Python 2.7
+OpenFisca runs on Python 3.6, or more recent versions.
+
+Backward compatibility with Python 2.7 is for now guaranteed, but will be dropped from January 1st, 2019.


Not sure if this would read better as Backward compatibility with Python 2.7 is maintained for now,...

I think it is important to give a date to openfisca project mantainers so they can get organized.

magopian · 2018-08-02T08:36:32Z

openfisca_core/columns.py

@@ -32,6 +34,7 @@ def make_column_from_variable(variable):
        int: IntCol,
        float: FloatCol,
        str: StrCol,
+        bytes: StrCol,


Is that related to this PR?

Actually, yes!

In Python 2, by importing str from builtins, in this module str is now (roughly) Python 3's str.

But if in a country package I'm using the regular Python 2 str (which is the same than bytes) as a value_type, then I'll have a KeyError in CONVERSION_MAP[variable.value_type].

So we need to add this to stay compatible with Python 2.

magopian · 2018-08-02T12:28:07Z

openfisca_web_api_preview/loader/tax_benefit_system.py

@@ -13,9 +14,9 @@ def build_tax_benefit_system(country_package_name):
        country_package = importlib.import_module(country_package_name)
    except ImportError:
        message = linesep.join([traceback.format_exc(),
-                                u'Could not import module `{}`.'.format(country_package_name).encode('utf-8'),


Wow, weird, how was it working previously? It was joining one line of bytes with lines of str? How was that possible? I must be missing something...

I guess join is flexible enough to handle both bytes and str:

' '.join([u'X', b'Y']) works in Python 2

I believe in the code it's the opposite of your example, and also you need to have a mix of chars that won't auto decode properly, like:

u"\n".join(["é", u"é"])

This will fail, but I guess this specific use case never occured in real life. Wow, lucky, and yay for py3 and proper explicit unicode handling!

magopian · 2018-08-02T12:28:41Z

openfisca_web_api_preview/loader/variables.py

@@ -38,9 +39,9 @@ def build_source_url(country_package_metadata, source_file_path, start_line_numb

 def build_formula(formula, country_package_metadata, source_file_path, tax_benefit_system):
    source_code, start_line_number = inspect.getsourcelines(formula)
-    if source_code[0].lstrip(' ').startswith('@'):  # remove decorator


did you remove the decorator removal on purpose?

We haven't been using these decorators for a long time (FI for this kind of off-topic cleanups I try to explicit the intention in commit titles)

Agreed, it's not related to the PR.

magopian · 2018-08-02T12:32:45Z

tests/core/test_countries.py

@@ -159,7 +160,7 @@ def test_calculate_variable_with_wrong_definition_period():
    expected_words = ['period', '2016', 'month', 'basic_income', 'ADD']

    for word in expected_words:
-        assert word in error_message, u'Expected "{}" in error message "{}"'.format(word, error_message).encode('utf-8')
+        assert word in error_message, 'Expected "{}" in error message "{}"'.format(word, error_message).encode('utf-8')


what is the .encode('utf-8') for here?

Isn't adding .encode('utf-8') everywhere the answer to all encodings issues in Python ?

More seriously, removing it 👍 (but I'm sure there are many others)

magopian · 2018-08-02T12:34:01Z

tests/core/test_yaml.py

-if not os.path.isdir(yaml_tests_dir):
-    ci_yaml_tests_dir = os.path.join(os.path.expanduser('~'), 'openfisca-core', 'tests', 'core', 'yaml_tests')
-    yaml_tests_dir = ci_yaml_tests_dir if os.path.isdir(ci_yaml_tests_dir) else yaml_tests_dir
-


That doesn't seem to be related to the issue, am I missing something?

No, another off-topic quick cleanup, sorry 😅

magopian · 2018-08-02T12:34:57Z

tests/web_api/basic_case/test_variables.py

@@ -13,7 +14,7 @@ def assert_items_equal(x, y):
 # /variables

 variables_response = subject.get('/variables')
-GITHUB_URL_REGEX = '^https://github\.com/openfisca/openfisca-country-template/blob/\d+\.\d+\.\d+((.dev|rc)\d+)?/openfisca_country_template/variables/(.)+\.py#L\d+-L\d+$'
+GITHUB_URL_REGEX = '^https://github\.com/openfisca/country-template/blob/\d+\.\d+\.\d+((.dev|rc)\d+)?/openfisca_country_template/variables/(.)+\.py#L\d+-L\d+$'


That doesn't seem to be related to the issue at hand either

No, but tests won't pass otherwise, as the country template we use for tests has evolved recently.

Anna-Livia · 2018-08-02T10:19:19Z

openfisca_core/holders.py

@@ -1,6 +1,6 @@
 # -*- coding: utf-8 -*-

-
+from __future__ import unicode_literals


put both __future__ imports on the same line ?

Anna-Livia · 2018-08-02T12:27:57Z

README.md

@@ -6,7 +6,9 @@ This package contains the core features of OpenFisca, which are meant to be used

 ## Environment

-This package requires Python 2.7
+OpenFisca runs on Python 3.6, or more recent versions.


If we use 3.6, and drop 2.7 we can now use f strings ^^ https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498

f"Missing value for variable {holder.variable.name} at {period}"

I know! But they are not backportable to 2.7 😞 , unless we use some kind of transpilation, and I'm not sure I want to go there...

Anna-Livia · 2018-08-02T13:00:52Z

README.md

-This package requires Python 2.7
+OpenFisca runs on Python 3.6, or more recent versions.
+
+Backward compatibility with Python 2.7 is for now guaranteed, but will be dropped from January 1st, 2019.


I think it is important to give a date to openfisca project mantainers so they can get organized.

Anna-Livia · 2018-08-02T14:12:09Z

openfisca_core/scripts/measure_numpy_condition_notations.py

@@ -10,6 +10,8 @@

 The aim of this script is to compare the time taken by the calculation of the values
 """
+from __future__ import print_function


What features is print_functionbringing in this file ?

Could we write :

from __future__ import unicode_literals, print_function

Not sure, futurize added that for me 😄 (I know, bad answer).

I don't think it really brings features per se. It just provides the module the Python 3 version of print. The only concrete advantages I can see:

if a contributor on Python 2 uses print the old way, they will get the error early, instead of waiting for the CI to crash.

Running futurize again will not suggest the same change again

Actually, I think to avoid confusion and to avoid inconsistency, let me just import all the future in every module, on one line.

fpagnoux · 2018-08-02T17:47:31Z

FYI, black will automatically remove the u prefixes when the unicode_literals are used. It might make sense to adopt it and use it to automatically reformat the code? I gave it a go in openfisca/openfisca-france#1060 if you need an example of what it does.

I wish I had known about this before half-manually removing all the u 😞. Thanks for the tips!

I saw a couple of changes that seem unrelated to the backport, not sure if they should be done in another PR?

Ideally, yes... I'd say that the size and complexity of the riders fall in the usual margin of tolerance, but I'm obviously biased by my self-interest.

magopian · 2018-08-03T07:20:07Z

Very good work @fpagnoux on following up with the review 👍
IMHO this is ready to land.

There are two concerns left to discuss in other issues:

Automating the removal of u prefixes: Adopt a code formatter #706
Automating the addition of from __future__ import unicode_literals: Automatically check that new files import unicode_literals #707

fpagnoux added 4 commits August 1, 2018 16:06

Backport unicode litterals in the web API

ab625b7

Remove outdated check

cd36929

Backport unicode litterals

6c660fd

Backport unicode litterals

dce344d

fpagnoux force-pushed the unicode-py3 branch from 77b2d1f to f5800c9 Compare August 1, 2018 20:57

fpagnoux requested review from sandcha, Anna-Livia and bonjourmauko August 1, 2018 21:09

fpagnoux force-pushed the unicode-py3 branch from b7e7691 to b54b15b Compare August 1, 2018 21:21

fpagnoux added the flow:team:doing label Aug 1, 2018

fpagnoux force-pushed the unicode-py3 branch from 0d19d12 to 1d03f0c Compare August 2, 2018 01:45

fpagnoux mentioned this pull request Aug 2, 2018

Make Web API optional #703

Merged

fpagnoux self-assigned this Aug 2, 2018

magopian reviewed Aug 2, 2018

View reviewed changes

Anna-Livia requested changes Aug 2, 2018

View reviewed changes

fpagnoux mentioned this pull request Aug 2, 2018

Deprecate Python 2.x #693

Closed

8 tasks

fpagnoux requested a review from Anna-Livia August 2, 2018 17:38

This was referenced Aug 3, 2018

Adopt a code formatter #706

Closed

Automatically check that new files import unicode_literals #707

Closed

Anna-Livia approved these changes Aug 3, 2018

View reviewed changes

fpagnoux added 6 commits August 3, 2018 12:21

Bump version number

2d9f54c

Remove outdated edge case handling

fed5a64

Simplify to_unicode

66f2d3c

Update CHANGELOG.md

d2c44cd

Handle variables which type is bytes

171ddb5

State Python 3 as primary environment

d5d46dd

fpagnoux added 2 commits August 3, 2018 12:21

Fix country template URL

00aeb0f

Backport Python 3 builtins everywhere

0f2991d

fpagnoux force-pushed the unicode-py3 branch from 37fd634 to 0f2991d Compare August 3, 2018 16:21

fpagnoux merged commit a044a30 into master Aug 3, 2018

fpagnoux deleted the unicode-py3 branch August 3, 2018 16:22

fpagnoux removed the flow:team:doing label Aug 3, 2018


		### 23.3.2 [#702](https://github.com/openfisca/openfisca-core/pull/702)

		Minor Change without impacts for reusers:

		@@ -1,6 +1,6 @@
		# -- coding: utf-8 --


		from __future__ import unicode_literals

Backport unicode litterals from Python 3 #702

Backport unicode litterals from Python 3 #702

Conversation

fpagnoux commented Aug 1, 2018

Rationale

MattiSG commented Aug 1, 2018 • edited Loading

fpagnoux commented Aug 1, 2018 • edited Loading

magopian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpagnoux Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

magopian Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Anna-Livia Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpagnoux commented Aug 2, 2018

magopian commented Aug 3, 2018

MattiSG commented Aug 1, 2018 •

edited

Loading

fpagnoux commented Aug 1, 2018 •

edited

Loading

fpagnoux Aug 2, 2018 •

edited

Loading

magopian Aug 3, 2018 •

edited

Loading

Anna-Livia Aug 2, 2018 •

edited

Loading