-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Platform win32] Fix crash when pipe encoding is set to None #4584
Conversation
As None is not a valid encoding value, fallback to 'utf-8'. This case happen if stdout or stderr is of type io.stringIO.
Can you add a test? And a blurb to RELEASE.txt and CHANGES.txt ? |
What's causing stdout.encoding to be None? |
That's just the way it is (default) - from Python, not from us. |
I updated RELEASE.txt and CHANGES.txt. |
The proposed changes in this PR are similar to an issue that arose in the Godot project. The Godot issue was based on an implementation proposed in a SCons PR that was closed and subsequently implemented in Godot. For reference:
Excerpt from the Godot issue [1] proposed implementation which may apply here as well:
While the above applied specifically to msvc compiler and linker invocations, the general win32 redirected output is from the invoked program's stdout/stderr redirected to temporary files. The console encoding likely should be used for decoding the temporary output file reads. It might be worth testing the following: if stdout is not None and not stdoutRedirected:
try:
with open(tmpFileStdoutName, "rb") as tmpFileStdout:
output = tmpFileStdout.read()
- stdout.write(output.decode(stdout.encoding, "replace"))
+ stdout.write(output.decode("oem", "replace").replace("\r\n", "\n"))
os.remove(tmpFileStdoutName)
except OSError:
pass if stderr is not None and not stderrRedirected:
try:
with open(tmpFileStderrName, "rb") as tmpFileStderr:
errors = tmpFileStderr.read()
- stderr.write(errors.decode(stderr.encoding, "replace"))
+ stderr.write(errors.decode("oem", "replace").replace("\r\n", "\n"))
os.remove(tmpFileStderrName)
except OSError:
pass
It also might be worth checking if multi-line writes to the redirected temporary files actually contain extra newline characters following a decode without the explicit replacement (e.g. I know in some of the tests done when writing the Godot isssue, there were more unicode replacements when using a native Windows language other than English (e.g., German) prior to changing to "oem". As always, I could be really wrong. |
The following example illustrates the CRLF issue when decoding binary reads of the console output. Note: any differences in the actual decoding method ("utf-8", "oem", etc., ...), if any, will be addressed in a future example. Source file: #include <stdio.h>
int main(int argc, char* argv) {
printf("hello");
return 0;
} SConstruct: import SCons
import sys
import io
import atexit
mystdout = StringIO()
mystderr = StringIO()
def dump_stringio():
mystderr.seek(0)
mystdout.seek(0)
print()
print("---------STDERR----------")
print(mystderr.read())
print("---------STDOUT----------")
print(mystdout.read())
print("-------------------------")
atexit.register(dump_stringio)
def piped_spawn(sh, escape, cmd, args, env):
from SCons.Platform.win32 import piped_spawn as scons_piped_spawn
rval = scons_piped_spawn(sh, escape, cmd, args, env, mystdout, mystderr)
return rval
class EnvironmentFactory:
program = 'hello'
program_dir = './src'
program_files = ['hello.c']
env_list = []
@classmethod
def make_program(cls, **kwargs):
build_n = len(cls.env_list) + 1
build = '_build{:03d}'.format(build_n)
print('Build:', build, kwargs, file=sys.stdout)
VariantDir(build, cls.program_dir, duplicate=0)
env=Environment(tools=['msvc', 'mslink'], **kwargs)
env["SPAWN"] = piped_spawn
build += '/'
env.Program(build + cls.program, [build + filename for filename in cls.program_files])
cls.env_list.append(env)
return env
for kwargs in [
{'MSVC_VERSION': '14.3', 'CCFLAGS': '/nologo /showIncludes /fakecl', 'LINKFLAGS': '/nologo /fakelink'},
]:
EnvironmentFactory.make_program(**kwargs) PR decoding:
Suggested decoding:
|
The following example illustrates a decoding issue when using "utf-8" compared to "oem". When the language is English there are no differences. When the language is German there are differences. SConstruct changes: force_encode_decode = False
def dump_stringio():
mystderr.seek(0)
mystdout.seek(0)
errors = mystderr.read()
output = mystdout.read()
if force_encode_decode:
errors = errors.encode(sys.stdout.encoding, errors="replace").decode(sys.stdout.encoding)
output = output.encode(sys.stdout.encoding, errors="replace").decode(sys.stdout.encoding)
print()
print("---------STDERR----------")
print(errors)
print("---------STDOUT----------")
print(output)
print("-------------------------")
atexit.register(dump_stringio)
...
for kwargs in [
{'MSVC_VERSION': '14.3', 'CCFLAGS': '/nologo /showIncludes /fakecl', 'LINKFLAGS': '/nologo /SUBSYSTEM:CONSOLE,4.0 /fakelink'},
]:
EnvironmentFactory.make_program(**kwargs) PR decoding:
Suggested decoding:
|
Right... and MSCommon uses oem, as we were advised by someone a while back, once we got to Python versions where that was consistently supported (3.6+). So is there a problem with this? |
The first issue is that when passing a StringIO object (e.g., I believe that decode should just use "oem". The temporary file contents are populated by redirecting the output from the invoked process. I believe that the console encoding should be used rather than the python stream encoding. Instead of using the stream encoding as in this PR: Use the "oem" encoding (like MSCommon): If the temporary file reads remain in binary, then the explicit CR LF sequence needs to be replaced with a NL to avoid the "extra" blank lines: The changes for this PR are: if stdout is not None and not stdoutRedirected:
try:
with open(tmpFileStdoutName, "rb") as tmpFileStdout:
output = tmpFileStdout.read()
- stdout.write(output.decode(stdout.encoding, "replace"))
+ stdout.write(output.decode(stdout.encoding if stdout.encoding is not None else 'utf-8', "replace"))
os.remove(tmpFileStdoutName)
except OSError:
pass
if stderr is not None and not stderrRedirected:
try:
with open(tmpFileStderrName, "rb") as tmpFileStderr:
errors = tmpFileStderr.read()
- stderr.write(errors.decode(stderr.encoding, "replace"))
+ stderr.write(errors.decode(stderr.encoding if stderr.encoding is not None else 'utf-8', "replace"))
os.remove(tmpFileStderrName)
except OSError:
pass I believe that this may be more robust: if stdout is not None and not stdoutRedirected:
try:
with open(tmpFileStdoutName, "rb") as tmpFileStdout:
output = tmpFileStdout.read()
- stdout.write(output.decode(stdout.encoding, "replace"))
+ stdout.write(output.decode("oem", "replace").replace("\r\n", "\n"))
os.remove(tmpFileStdoutName)
except OSError:
pass
if stderr is not None and not stderrRedirected:
try:
with open(tmpFileStderrName, "rb") as tmpFileStderr:
errors = tmpFileStderr.read()
- stderr.write(errors.decode(stderr.encoding, "replace"))
+ stderr.write(errors.decode("oem", "replace").replace("\r\n", "\n"))
os.remove(tmpFileStderrName)
except OSError:
pass |
I refactored your fix, I'm not a big fan of x if xyz else abc form. |
The refactored code introduced bugs. The stream encoding attribute may not be writable:
Writing to the encoding attribute of a StringIO object is not allowed:
The original implementation didn't attempt to assign the encoding but rather picked the encoding to pass to the decode call. If one really wants to use the For example:
But... I'm not sure we want to use It may be a good idea to always use "oem" rather than what may be set for the python stream encoding. The temporary file is likely being written by an external process and not necessarily the current python. Using the python stream encoding versus an external process console encoding seems like an apples versus oranges type of issue.
I suggest removing - # Sanitize encoding. None is not a valid encoding.
- # Since we're handling a redirected shell command use
- # the shells default encoding.
- if stdout.encoding is None:
- stdout.encoding = 'oem'
- if stderr.encoding is None:
- stderr.encoding = 'oem'
...
- stdout.write(output.decode(stdout.encoding, "replace").replace("\r\n", "\n"))
+ stdout.write(output.decode("oem", "replace").replace("\r\n", "\n"))
...
- stderr.write(errors.decode(stderr.encoding, "replace"))
+ stderr.write(errors.decode("oem", "replace").replace("\r\n", "\n")) @mwichmann Any thoughts? |
@jcbrill as usual your suggested fix and analysis are well thought out! |
…m. Updated CHANGES/RELEASE
Exactly. That is how the output fragments above were produced. A windows 11 VMWare virtual machine with german as the default system language and necessary language packs installed. German was the first non-English system language that I tried when trying to find any encoding issues with the proposed spawn changes for the godot project and the SCons PR that was eventually rejected due to being implemented in/around the spawn code for win32. Producing compiler warnings and error messages illustrated the extra newline issue. The details are a little fuzzy now, but there may have also been reliance on either an english word or a phrase order that changed with the language for the original godot-like proposed changes. Shown above, one of the german warning messages contains the |
Amusingly (maybe), the last time we had a go-round with Windows and non-ascii characters, I set up the Swedish language support, as it's a native language for me, where German is only a learned language (and largely unused for many many years). But it turned out Swedish is not a language the compiler suite produces i18n messages for, so I had to switch that test setup to German (I no longer have that VM) |
If stdout or stderr argument is of type io.stringIO, the function crash because stringIO has it's encoding property set to None.
This issue has been introduced by this commit in version 4.7.0
Contributor Checklist:
CHANGES.txt
(and read theREADME.rst
)