Explicitly use utf-8 when decoding bytestrings #768

While Python 3 defaults to utf-8 in `bytes.decode()`, Python 2's equivalent (`str.decode()`) will use the default encoding as set by site.py (which is almost always ascii). From looking at the code, it seems that these decodes have just sort of been fixed piecemeal (likely when someone realized that pygit2 was failing to handle unicode properly, but any decodes which run on Python 2 that don't specify utf-8 as the encoding are a ticking time bomb. I personally noticed this was a problem when I encountered a traceback in the RemoteCallbacks while fetching a new branch which contained utf-8 characters. During the fetch, when `pygit2.remote.maybe_string()` was invoked by `_update_tips_cb()` with a pointer to a bytestring containing unicode, the decode fails because the default encoding is ascii. As it turns out, this was fixed in master, but there are a number which still have no explicit encoding. This commit explicitly uses utf-8 for all remaining bytestring decodes which do not have an encoding specified, aside from one in PY3-specific code where doing so would be redundant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly use utf-8 when decoding bytestrings #768

Explicitly use utf-8 when decoding bytestrings #768

Commits on Feb 7, 2018