Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rez-release UnicodeDecodeError (windows) #776

Closed
willjp opened this issue Oct 28, 2019 · 9 comments · Fixed by #779
Closed

rez-release UnicodeDecodeError (windows) #776

willjp opened this issue Oct 28, 2019 · 9 comments · Fixed by #779

Comments

@willjp
Copy link
Contributor

willjp commented Oct 28, 2019

rez:  2.47.9
os: windows-10.0.17763
python: 3.7.4

Occurs when rez-releasing a package on windows, if the git repostiory already has the tag wanted by the rez-release.

Occured during rez-release, when it runs git log in subprocess with text=True, and log contains chr(141) .

Checking state of repository...
Exception in thread Thread-23:
Traceback (most recent call last):
  File "C:\Python-3.7.4\Lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Python-3.7.4\Lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Python-3.7.4\Lib\subprocess.py", line 1238, in _readerthread
    buffer.append(fh.read())
  File "c:\users\willjp\documents\rez\install\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1021688: character maps to <undefined>
@bfloch
Copy link
Contributor

bfloch commented Oct 28, 2019

I wonder if this is the same stuff I saw.
@JeanChristopheMorinPerso advised me to try

${ENV:PYTHONIOENCODING} = "UTF-8"

@willjp
Copy link
Contributor Author

willjp commented Oct 28, 2019

sorry, nevermind after the successful release I cannot reproduce it.

The next time I encounter it (it's only happened twice) I'll try that. I'll close this issue until I can reproduce it.

Thanks again @bfloch

@willjp willjp closed this as completed Oct 28, 2019
@willjp willjp reopened this Oct 28, 2019
@willjp
Copy link
Contributor Author

willjp commented Oct 28, 2019

I found the cause!!

For some reason, this is only occurring during some rez-releases, despite them all sharing the same git history. There is a reverse-line-feed character in the git commit log, that is causing text=True to fail.

I'll make a pull request in a minute, but I was able to solve this by decoding the bytestring myself instead of using text=True . The problem character was b'\x8d' . I'm not sure how it got in the log.

example failure

pipe = subprocess.Popen(
    ['git', 'log'], 
    stdout=subprocess.PIPE, 
    stderr=subprocess.PIPE, 
    text=True)
(stdout, _) = pipe.communicate()

Exception in thread Thread-3:
Traceback (most recent call last):
  File "C:\Python-3.6.5\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Python-3.6.5\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Python-3.6.5\lib\subprocess.py", line 1063, in _readerthread
    buffer.append(fh.read())
  File "C:\Python-3.6.5\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1023172: character maps to <undefined>

workaround

pipe = subprocess.Popen(['git', 'log'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(stdout, _) = pipe.communicate()
stdout_unicode = stdout.decode('utf-8')

@willjp
Copy link
Contributor Author

willjp commented Oct 28, 2019

Sorry and for completion, I did try setting ${ENV:PYTHONIOENCODING} = "UTF-8", but it did not resolve the issue.

@nerdvegas
Copy link
Contributor

Snip from slack:

Allan Johns 9:28 AM
I'm confused as to why this works as opposed to using text=True
9:28 AM
according to docs:
If encoding or errors are specified, or text (also known as universal_newlines) is true, the file objects stdin, stdout and stderr will be opened in text mode using the encoding and errors specified in the call or the defaults for io.TextIOWrapper.
9:29 AM
then, from io.TextIOWrapper:
encoding gives the name of the encoding that the stream will be decoded or encoded with. It defaults to locale.getpreferredencoding(False).
9:30 AM
could you try that out and see what it returns?
9:30 AM
interestingly, for me on py27, I get ANSI_X3.4-1968. But on py37 I get UTF-8

Allan Johns 9:36 AM
aaand I just replied on #github-prs when we aren't supposed to, DOH
9:38 AM
I've also just now tried setting PYTHONIOENCODING='UTF-8', but that didn't work

@nerdvegas
Copy link
Contributor

So, I think the better fix may be t oset `encoding='utf-8' in our Popen wrapper. This will force UTF-8 decoding when text=True, which feels cleaner than having to explicitly decode as above. Also, I am assuming this that problem could occur wherever we use Popen, so this way the fix is applied across the board.

Do you mind trying this out?

See from https://docs.python.org/3/library/subprocess.html#frequently-used-arguments:

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.

@willjp
Copy link
Contributor Author

willjp commented Oct 28, 2019

yes! that works. thank you so much!

@willjp
Copy link
Contributor Author

willjp commented Oct 28, 2019

incoming update to PR...

@willjp
Copy link
Contributor Author

willjp commented Oct 28, 2019

fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants