Decode popen output using the system locale if UTF-8 decoding fails. #380

Cielquan · 2021-02-02T21:08:36Z

Based on my comments in the issue #309 I created a decoding wrapper function. As utf8 is the python default I kept it hard coded as first choice of the encodings to try. If utf8 fails sys.stdout.encoding, sys.getdefaultencoding() and locale.getpreferredencoding() (in this order) deliver possible other encodings to try. If all decoding attempts fail the UnicodeDecodeError exception is risen.

I'm open for other orders of the encoding sources.

If this fix is well received, I will add the missing tests and fix the broken ones if there are any.

fixes #309

fixes #309 by trying to decode popen output with utf8 first and on error tries other encodings provided by the systems preferences.

cjolowicz

Could this maybe be simplified to try UTF-8 first, and fall back to locale.getpreferredencoding in case of a UnicodeDecodeError?

I would skip sys.getdefaultencoding because it appears to be hardcoded to UTF-8. See CPython source (1, 2) and Victor Stinner's notes.

sys.stdout.encoding does not seem relevant here, as we are decoding the output of another process. IIUC, on a legacy Windows system the process output will typically be CP-1252, while our own stdout might be a console device, which is always UTF-8. On non-Windows systems, it's just the locale encoding.

(The code assumes that a wrong encoding will result in a UnicodeDecodeError, and not simply in garbled output. As far as I'm aware, this is rare when decoding to UTF-8, at least from ASCII-based encodings [citation needed 😉], so I guess this is fine.)

stsewd · 2021-02-02T23:00:49Z

@Cielquan you could try installing nox from this branch to test if this solves your problem (if you haven't already!)

Cielquan · 2021-02-03T09:08:41Z

Thanks for the fast replies you two!

@cjolowicz
Thanks for the additional information. I should have looked more thoroughly through the docs. 😅
I think its a good idea to discard sys.stdout.encoding and sys.getdefaultencoding. 👍🏾
Yeah garbled output is not detected. That was another reason to try utf-8 as first encoding. E.g. latin-1 will work on our german umlaut ü but the output is not what one wants (the same goes for cp1252):

>>> "ü".encode("utf8").decode("latin-1")
'Ã¼'

But I think this is better than an UnicodeDecodeError.

@stsewd
I monkeypatched nox.command.popen inside my noxfile.py to overwrite the hard coded utf8 with sys.stdout.encoding, which worked for me and the windows GHA CI succeeded.
After that I developed the added function inside my noxfile.py.
I tested it on my dev machine (linux), a windows vm and all three OS in CI. It ran everywhere.

Cielquan · 2021-02-03T10:29:36Z

Sooo ...

I made adjustments proposed by @cjolowicz and added some tests.

I use the b"\x95" byte which caused my problems (see my comment) as a test example. It does not work with utf-8 and asccii but does work with cp1252 so it was a good candidate to test all possible outcomes.

Cielquan · 2021-02-03T11:02:11Z

Some remarks on the CI test:

python3.9 does not run in appveyor somehow (see here):

...
nox > Running session tests-3.9
nox > Session tests-3.9 skipped: Python interpreter 3.9 not found.
Build success

coverage is 100% in Travis for linux, but on my local linux machine I don't reach 100% when running $ nox -e tests:

...
Name                  Stmts   Miss Branch BrPart  Cover   Missing
-----------------------------------------------------------------
nox/__init__.py           6      0      0      0   100%
nox/__main__.py          17      0      4      0   100%
nox/_decorators.py       38      0      2      0   100%
nox/_option_set.py       99      0     34      0   100%
nox/_options.py          57      0     24      0   100%
nox/_parametrize.py      74      0     36      0   100%
nox/command.py           57      0     26      0   100%
nox/logger.py            46      0     12      0   100%
nox/manifest.py         131      0     72      0   100%
nox/popen.py             33      0      6      0   100%
nox/registry.py          20      0      6      0   100%
nox/sessions.py         245      0     76      0   100%
nox/tasks.py             91      0     32      0   100%
nox/tox_to_nox.py        18      0      2      0   100%
nox/virtualenv.py       177     22     54      0    87%   194-206, 219-248
nox/workflow.py          15      0      6      0   100%
-----------------------------------------------------------------
TOTAL                  1124     22    392      0    98%

coverage in appveyor is not 100% because of missing parts in virtualenv.py and workflow.py. I can recommend coverage-conditional-plugin to add conditional pragma: no covers to the code base. You then can test for 100% coverage in any case.

EDIT1: spelling

cjolowicz · 2021-02-03T18:38:06Z

on my local linux machine I don't reach 100%

The test suite skips conda tests if you don't have conda installed, and those missing lines are from CondaEnv. Could that be what caused it?

Cielquan · 2021-02-03T18:49:45Z

Well yeah, I totally forgot about that. Thanks.

Cielquan · 2021-02-08T17:15:05Z

Updated the branch

Cielquan · 2021-02-11T06:38:03Z

Updated the branch.

Anything else needed from me? @theacodes

cjolowicz

Hi @Cielquan thanks for moving this forward!

Today I found some time to test this on a legacy system running Windows Server
2008 (64-bit). I could reproduce the original issue, and I could also confirm
that your fix works 🚀 See details below.

I have one suggestion, though. As you can see in the last command output below,
when both encodings fail, Python will chain the tracebacks for both
UnicodeDecodeError exceptions, including these two error messages:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
UnicodeDecodeError: '[utf-8, BIG5]' codec can't decode byte 0x95 in position 0: incomplete multibyte sequence

You can see that modifying the second exception during flight is both
unnecessary and slightly confusing. I would therefore suggest that you eliminate
the second try...catch block, keeping only the body of the try clause.
I'll leave a code suggestion to clarify what I mean.

Stylistically, I would also suggest to eliminate the decoded_output variable,
and simply return instead of assigning to it. But this is just a nitpick, feel
free to ignore.

Here are the details, FTR:

On the Windows system I used, locale.getpreferredencoding returns CP-1252.
sys.stdout.encoding returns UTF-8 when writing to the command prompt, and
CP-1252 when redirecting output to a file.

Like you, I used the Unicode BULLET codepoint for testing, which is 0x95
when encoded in CP-1252.

>>> import unicodedata
>>> unicodedata.name(bytes([0x95]).decode("cp1252"))
'BULLET'

This byte sequence causes an error when decoded to UTF-8:

>>> bytes([0x95]).decode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte

I used the following noxfile.py for testing:

import nox

@nox.session
def tests(session):
    session.run("python", "-c", 'import sys; print(f"{sys.stdout.encoding = }")')
    session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)

We pass silent to get Nox to capture stdout. This is done by default when you get here via session.install.

Below is the output when running nox from the latest release, and from your branch. Expand the output using the arrows.

Reproducing the error:

(venv) C:\Users\cjolowicz>nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'utf-8'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests raised exception UnicodeDecodeError('utf-8', b'\x95', 0, 1, 'invalid start byte')
Traceback (most recent call last):
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 549, in execute
    self.func(session)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\_decorators.py", line 53, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\cjolowicz\noxfile.py", line 6, in tests
    session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 262, in run
    return self._run(*args, env=env, **kwargs)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 318, in _run
    return nox.command.run(args, env=env, paths=self.bin_paths, **kwargs)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\command.py", line 111, in run
    return_code, output = popen(
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\popen.py", line 48, in popen
    return return_code, out.decode("utf-8") if out else ""
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
nox > Session tests failed.

Reproducing the error with output redirected:

(venv) C:\Users\cjolowicz>nox > nox.log 2>&1
(venv) C:\Users\cjolowicz>type nox.log
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'cp1252'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests raised exception UnicodeDecodeError('utf-8', b'\x95', 0, 1, 'invalid start byte')
Traceback (most recent call last):
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 549, in execute
    self.func(session)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\_decorators.py", line 53, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\cjolowicz\noxfile.py", line 6, in tests
    session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 262, in run
    return self._run(*args, env=env, **kwargs)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 318, in _run
    return nox.command.run(args, env=env, paths=self.bin_paths, **kwargs)
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\command.py", line 111, in run
    return_code, output = popen(
  File "c:\users\cjolowicz\venv\lib\site-packages\nox\popen.py", line 48, in popen
    return return_code, out.decode("utf-8") if out else ""
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
nox > Session tests failed.

Confirming that the fix works (happy case):

(venv-fix) C:\Users\build\cjolowicz>nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'utf-8'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests was successful.

(venv-fix) C:\Users\build\cjolowicz>nox > nox.log 2>&1

(venv-fix) C:\Users\build\cjolowicz>type nox.log
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'cp1252'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests was successful.

Confirming that the fix works (error case: output cannot be decoded from the preferred encoding either):

❯ env LC_ALL=zh_TW.big5 nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python in .nox/tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'big5'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests raised exception UnicodeDecodeError('big5', b'\x95', 0, 1, 'incomplete multibyte sequence')
Traceback (most recent call last):
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/popen.py", line 31, in decode_output
    decoded_output = output.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/sessions.py", line 551, in execute
    self.func(session)
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/_decorators.py", line 53, in __call__
    return self.func(*args, **kwargs)
  File "/tmp/nox-unicode/noxfile.py", line 6, in tests
    session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/sessions.py", line 264, in run
    return self._run(*args, env=env, **kwargs)
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/sessions.py", line 320, in _run
    return nox.command.run(args, env=env, paths=self.bin_paths, **kwargs)
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/command.py", line 111, in run
    return_code, output = popen(
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/popen.py", line 74, in popen
    return return_code, decode_output(out) if out else ""
  File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/popen.py", line 38, in decode_output
    decoded_output = output.decode(second_encoding)
UnicodeDecodeError: '[utf-8, BIG5]' codec can't decode byte 0x95 in position 0: incomplete multibyte sequence
nox > Session tests failed.

nox/popen.py

Cielquan · 2021-02-11T11:26:34Z

@cjolowicz Thanks for your thorough answer. I also learned some new things 😀

I agree in all your points and will gladly take your suggestions and fix the tests accordingly later (hopefully today elsewise tomorrow).

I think the decoded_output variable must be a relic from earlier versions for the code or something even before the PR.
The double exceptions originate from an older version also I think. Earlier I raised the manipulated UnicodeDecodeError with a from UnicodeDecodeError or so.

Co-authored-by: Claudio Jolowicz <cjolowicz@gmail.com>

Cielquan · 2021-02-11T13:07:27Z

Ok fixed the tests. Was only one though.

With this the PR should be ready for merging?

cjolowicz

LGTM

theacodes · 2021-02-11T21:09:09Z

Looks great! Thank you both!

theacodes · 2021-02-11T21:30:33Z

Just a heads up @Cielquan and @cjolowicz:

This pull request is eligible for payment via our Open Collective. You can each file an expense for $20 USD for this contribution. Thanks!

cjolowicz · 2021-02-12T12:45:32Z

This pull request is eligible for payment via our Open Collective.

thank you for setting this up @theacodes ❤️

Cielquan · 2021-02-12T13:38:07Z

Thanks for merging @theacodes and thanks again for reviewing @cjolowicz.
This OpenCollective thing is nice 👍🏾

add decode_output func to decode popen output

1cea471

fixes #309 by trying to decode popen output with utf8 first and on error tries other encodings provided by the systems preferences.

cjolowicz reviewed Feb 2, 2021

View reviewed changes

simplified decode_output function

62f3272

Cielquan added 4 commits February 3, 2021 10:15

fix linting issues

2dfed80

fix double hard coded utf8

4f87b93

change utf8 to utf-8

65aad21

add tests for decode_output

b89bade

fix linting issues

1502b09

theacodes added the awaiting submitter action label Feb 8, 2021

Merge branch 'master' into master

7a9d1ae

Merge branch 'master' into master

320ef67

cjolowicz reviewed Feb 11, 2021

View reviewed changes

nox/popen.py Outdated Show resolved Hide resolved

Cielquan and others added 2 commits February 11, 2021 12:28

simplify nested try...except block and fix nested exception msg

7d7ffcf

Co-authored-by: Claudio Jolowicz <cjolowicz@gmail.com>

fix tests for simplified deode_output

2cc976d

cjolowicz approved these changes Feb 11, 2021

View reviewed changes

theacodes changed the title ~~fix decoding issues with popen output~~ Decode popen output using the system locale if UTF-8 decoding fails. Feb 11, 2021

theacodes merged commit 211702d into wntrblm:master Feb 11, 2021

theacodes removed the awaiting submitter action label Feb 11, 2021

cjolowicz mentioned this pull request Jun 6, 2021

Release 2021.6.6 #440

Merged

mlech-reef mentioned this pull request Oct 16, 2021

conda python310 maciej-lech/nox#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode popen output using the system locale if UTF-8 decoding fails. #380

Decode popen output using the system locale if UTF-8 decoding fails. #380

Cielquan commented Feb 2, 2021

cjolowicz left a comment •

edited

Loading

stsewd commented Feb 2, 2021

Cielquan commented Feb 3, 2021

Cielquan commented Feb 3, 2021

Cielquan commented Feb 3, 2021 •

edited

Loading

cjolowicz commented Feb 3, 2021

Cielquan commented Feb 3, 2021

Cielquan commented Feb 8, 2021

Cielquan commented Feb 11, 2021

cjolowicz left a comment

Cielquan commented Feb 11, 2021

Cielquan commented Feb 11, 2021

cjolowicz left a comment

theacodes commented Feb 11, 2021

theacodes commented Feb 11, 2021

cjolowicz commented Feb 12, 2021

Cielquan commented Feb 12, 2021

Decode popen output using the system locale if UTF-8 decoding fails. #380

Decode popen output using the system locale if UTF-8 decoding fails. #380

Conversation

Cielquan commented Feb 2, 2021

cjolowicz left a comment • edited Loading

Choose a reason for hiding this comment

stsewd commented Feb 2, 2021

Cielquan commented Feb 3, 2021

Cielquan commented Feb 3, 2021

Cielquan commented Feb 3, 2021 • edited Loading

cjolowicz commented Feb 3, 2021

Cielquan commented Feb 3, 2021

Cielquan commented Feb 8, 2021

Cielquan commented Feb 11, 2021

cjolowicz left a comment

Choose a reason for hiding this comment

Cielquan commented Feb 11, 2021

Cielquan commented Feb 11, 2021

cjolowicz left a comment

Choose a reason for hiding this comment

theacodes commented Feb 11, 2021

theacodes commented Feb 11, 2021

cjolowicz commented Feb 12, 2021

Cielquan commented Feb 12, 2021

cjolowicz left a comment •

edited

Loading

Cielquan commented Feb 3, 2021 •

edited

Loading