Replace old format string interpolation with f-strings #4400

sphuber · 2020-09-25T12:59:58Z

Since Python 3.5 is no longer supported, format string interpolations
can now be replaced by f-strings, introduced in Python 3.6, which are
more readable, require less characters and are more efficient.

Note that pylint issues a warning when using f-strings for log
messages, just as it does for format interpolated strings. The reasoning
is that this is slightly inefficient as the strings are always
interpolated even if the log is discarded, but also by not passing the
formatting parameters as arguments, the available metadata is reduced.
I feel these inefficiencies are premature optimizations as they are
really minimal and don't weigh up against the improved readability and
maintainability of using f-strings. That is why the pylint config is
update to ignore the warning logging-fstring-interpolation which
replaces logging-format-interpolation that was ignored before.

codecov · 2020-09-25T13:16:31Z

Codecov Report

Merging #4400 into develop will decrease coverage by 0.02%.
The diff coverage is 45.68%.

@@             Coverage Diff             @@
##           develop    #4400      +/-   ##
===========================================
- Coverage    79.24%   79.22%   -0.01%     
===========================================
  Files          475      475              
  Lines        34826    34831       +5     
===========================================
- Hits         27594    27593       -1     
- Misses        7232     7238       +6

Flag	Coverage Δ
#django	`73.07% <43.14%> (+0.01%)`	⬆️
#sqlalchemy	`72.30% <43.21%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...jsite/db/migrations/0015_invalidating_node_hash.py	`100.00% <ø> (ø)`
...backends/djsite/db/migrations/0024_dblog_update.py	`93.50% <0.00%> (ø)`
...site/db/migrations/0032_remove_legacy_workflows.py	`61.54% <0.00%> (ø)`
...migrations/0037_attributes_extras_settings_json.py	`86.85% <0.00%> (ø)`
...a/backends/djsite/db/migrations/0039_reset_hash.py	`100.00% <ø> (ø)`
aiida/backends/djsite/db/models.py	`81.29% <0.00%> (ø)`
aiida/backends/djsite/settings.py	`87.50% <0.00%> (ø)`
aiida/backends/sqlalchemy/manager.py	`78.00% <0.00%> (ø)`
...migrations/versions/041a79fc615f_dblog_cleaning.py	`91.21% <0.00%> (ø)`
...ns/12536798d4d3_trajectory_symbols_to_attribute.py	`95.66% <ø> (ø)`
... and 231 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4544bc4...0698e19. Read the comment docs.

sphuber · 2020-09-26T16:33:31Z

Who dares review this PR 😄 @greschd @ltalirz ?

ltalirz · 2020-09-26T17:42:29Z

First of all, thanks @sphuber for taking on the effort to clean this up!

Who dares review this PR 😄 @greschd @ltalirz ?

Hehe ;-) To me, the question is more how to best review this PR.
Could you perhaps describe how you did this - was this all done manually, did you use some search & replace operations, some refactoring tool, ...?
I wonder whether going through all of this line by line is the best way to spot potential issues...

P.S. This is just out of curiosity: Do you know why test coverage decreased (very mildly), despite f-strings being less verbose?

sphuber · 2020-09-26T18:31:14Z

Hehe ;-) To me, the question is more how to best review this PR.
Could you perhaps describe how you did this - was this all done manually, did you use some search & replace operations, some refactoring tool, ...?

Sure, I sure as hell didn't do this manually but used the flynt tool. I first ran with default settings, meaning it will perform the safest transformation. I checked the changes manually and did some minor corrections and pushed to have the tests run. When all tests passed, I ran the tool again, but this time with the options --transform-concats --agressive -ll 120. Here also I checked the changes manually and did some minor corrections. Then I manually searched with regex ['"]\.format\( and fixed some cases manually. There are still quite a few remaining but didn't want to start addressing them all manually so I left them for now.

I wonder whether going through all of this line by line is the best way to spot potential issues...

Probably not. These kinds of changes are always a bit risky since we are relying purely on the tests but many of these lines may not actually be covered. pylint will give us some extra security but probably not a 100%. To be honest I am also not sure how much this can and should be reviewed.

P.S. This is just out of curiosity: Do you know why test coverage decreased (very mildly), despite f-strings being less verbose?

I have no idea, but honestly this happens all the time and we always just ignore it.

chrisjsewell · 2020-09-30T05:27:49Z

This is just out of curiosity: Do you know why test coverage decreased, despite f-strings being less verbose?

Simple, because there're fewer lines covered 😄, e.g.

- "AttributeError: '{}' is not a valid attribute of the object "
- "'{}'".format(attr, self.__class__.__name__)
+ f"AttributeError: '{attr}' is not a valid attribute of the object '{self.__class__.__name__}'"

So now <lines covered>/<total lines> is a smaller fraction

we always just ignore it.

You might not have to after #4413 😉

chrisjsewell · 2020-09-30T05:37:04Z

For what its worth, I took a quick parse over the diff, and didn't notice any errors (my vision did start to blur a bit after the first hundred diffs though lol)

Note, flynt has a pre-commit hook: https://github.com/ikamensh/flynt#pre-commit-hook, should we consider adding it?

sphuber · 2020-09-30T08:34:33Z

Note, flynt has a pre-commit hook: https://github.com/ikamensh/flynt#pre-commit-hook, should we consider adding it?

I thought about this yeah, but decided not to. But to be honest, I am not sure anymore why I thought it best not to add it. I have now added it. Let's see how that goes

greschd · 2020-09-30T08:58:42Z

TBH I don't see why always using f-strings should be enforced. Something like the following still makes sense in my book:

some_dict = {'a': 1, 'b': 2}
<...>
string = 'values: a={a}, b={b}'.format(**some_dict)

chrisjsewell · 2020-09-30T09:01:17Z

Something like the following still makes sense in my book

Yeh but I don't think flynt would convert that?
anyway looks to be a weird problem with its CLI interaction with pre-commit?

sphuber · 2020-09-30T09:01:53Z

TBH I don't see why always using f-strings should be enforced. Something like the following still makes sense in my book:
some_dict = {'a': 1, 'b': 2}
<...>
string = 'values: a={a}, b={b}'.format(**some_dict)

Flynt won't touch these though. There are indeed some cases where it is better to use implicit conversion with format.

Edit: just need to figure out how to call it in the pre-commit hook. Apparently pre-commit is now passing all files individually to flynt as arguments and it doesn't understand it.

greschd · 2020-09-30T09:03:52Z

Alright, in that case it's fine. Haven't tried flynt myself, so I'm commenting from a place of ignorance 🙃

ltalirz · 2020-09-30T09:31:33Z

This is just out of curiosity: Do you know why test coverage decreased, despite f-strings being less verbose?

Simple, because there're fewer lines covered 😄,

Right, my misunderstanding came from the assumption that what mattered was the number of uncovered lines, but we are actually triggering on the percentage of uncovered lines.
I guess it's a little more complicated than you suggest as well, since there will be less lines both in the uncovered and in the covered code. Assuming that coverage is counted correctly, the question becomes: is the density of lines saved higher in covered or uncovered code (a priori, I would probably have bet on the latter, since it's often error messages in Except blocks that go uncovered).

Naively, when making a PR what I actually care about is whether/how much I'm increasing the total number of uncovered lines.
The problem is that for new code this will be slightly nonzero in almost all cases, and so it is difficult to decide when to raise a warning flag.
At the moment, we're essentially saying: for new code, cover it at least as well as the average codebase (which is a moving target as our coverage has been slowly increasing over time).

greschd · 2020-09-30T09:44:36Z

Ideally (IMO), the coverage criteria would be:

for any new logic, a fixed coverage (e.g. 95%) needs to be reached
for slight modifications to existing code (such as here), coverage should remain the same, with some tolerance

Of course the problem is that there's no way of distinguishing between the two cases, so we just fall back on the second (less strict) requirement. But it's a red flag if the number of missed lines increases by a substantial number.

ltalirz · 2020-09-30T10:06:15Z

But it's a red flag if the number of missed lines increases by a substantial number.

Fully agree, and it would be nice to be able to trigger on this (the codecov.yml reference does not seem to offer this option (?)).

The first priority, however, is probably still to get the random fluctuations under control.
To stay with the the codecov report above, it actually reports an increase in the total number of missed lines as well:

You may have noticed that, some time ago, codecov started adding comments to missed lines in the "Files" tab, which is helpful. ~~In this case, it makes clear that the report of these supposedly "uncovered lines" is really a bug~~, see e.g.
https://github.com/aiidateam/aiida-core/pull/4400/files#diff-31db4ebe16273a17841c21ee58d0d9baL184

Edit: It seems like the comments are added to all missed lines that are touched in the PR (whether they were missed before or not). I will need to figure out how to narrow them down to the 7 new misses that were reported or perhaps @sphuber remembers whether he added some extra lines somewhere?

.pre-commit-config.yaml

Since Python 3.5 is no longer supported, format string interpolations can now be replaced by f-strings, introduced in Python 3.6, which are more readable, require less characters and are more efficient. Note that `pylint` issues a warning when using f-strings for log messages, just as it does for format interpolated strings. The reasoning is that this is slightly inefficient as the strings are always interpolated even if the log is discarded, but also by not passing the formatting parameters as arguments, the available metadata is reduced. I feel these inefficiencies are premature optimizations as they are really minimal and don't weigh up against the improved readability and maintainability of using f-strings. That is why the `pylint` config is update to ignore the warning `logging-fstring-interpolation` which replaces `logging-format-interpolation` that was ignored before. The majority of the conversions were done automatically with the linting tool `flynt` which is also added as a pre-commit hook. It is added before the `yapf` step because since `flynt` will touch formatting, `yapf` will then get a chance to check it.

sphuber force-pushed the fix/4394/f-strings branch 3 times, most recently from c3575aa to 00c8797 Compare September 29, 2020 15:38

sphuber requested review from ltalirz and greschd September 29, 2020 17:47

sphuber force-pushed the fix/4394/f-strings branch from 00c8797 to 166ad37 Compare September 30, 2020 08:34

sphuber force-pushed the fix/4394/f-strings branch from 166ad37 to 6dea8e0 Compare September 30, 2020 11:35

greschd reviewed Sep 30, 2020

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

greschd reviewed Sep 30, 2020

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

sphuber force-pushed the fix/4394/f-strings branch 2 times, most recently from 5695149 to 6530471 Compare September 30, 2020 12:36

sphuber force-pushed the fix/4394/f-strings branch from 6530471 to 0698e19 Compare September 30, 2020 12:51

greschd approved these changes Sep 30, 2020

View reviewed changes

sphuber merged commit 02248cf into aiidateam:develop Sep 30, 2020

sphuber deleted the fix/4394/f-strings branch September 30, 2020 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace old format string interpolation with f-strings #4400

Replace old format string interpolation with f-strings #4400

sphuber commented Sep 25, 2020

codecov bot commented Sep 25, 2020 •

edited

Loading

sphuber commented Sep 26, 2020

ltalirz commented Sep 26, 2020 •

edited

Loading

sphuber commented Sep 26, 2020

chrisjsewell commented Sep 30, 2020

chrisjsewell commented Sep 30, 2020

sphuber commented Sep 30, 2020

greschd commented Sep 30, 2020

chrisjsewell commented Sep 30, 2020

sphuber commented Sep 30, 2020 •

edited

Loading

greschd commented Sep 30, 2020

ltalirz commented Sep 30, 2020 •

edited

Loading

greschd commented Sep 30, 2020

ltalirz commented Sep 30, 2020 •

edited

Loading

Replace old format string interpolation with f-strings #4400

Replace old format string interpolation with f-strings #4400

Conversation

sphuber commented Sep 25, 2020

codecov bot commented Sep 25, 2020 • edited Loading

Codecov Report

sphuber commented Sep 26, 2020

ltalirz commented Sep 26, 2020 • edited Loading

sphuber commented Sep 26, 2020

chrisjsewell commented Sep 30, 2020

chrisjsewell commented Sep 30, 2020

sphuber commented Sep 30, 2020

greschd commented Sep 30, 2020

chrisjsewell commented Sep 30, 2020

sphuber commented Sep 30, 2020 • edited Loading

greschd commented Sep 30, 2020

ltalirz commented Sep 30, 2020 • edited Loading

greschd commented Sep 30, 2020

ltalirz commented Sep 30, 2020 • edited Loading

codecov bot commented Sep 25, 2020 •

edited

Loading

ltalirz commented Sep 26, 2020 •

edited

Loading

sphuber commented Sep 30, 2020 •

edited

Loading

ltalirz commented Sep 30, 2020 •

edited

Loading

ltalirz commented Sep 30, 2020 •

edited

Loading