Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly use utf-8 when decoding bytestrings #768

Merged
merged 1 commit into from
Feb 7, 2018

Commits on Feb 7, 2018

  1. Explicitly use utf-8 when decoding bytestrings

    While Python 3 defaults to utf-8 in `bytes.decode()`, Python 2's
    equivalent (`str.decode()`) will use the default encoding as set by
    site.py (which is almost always ascii).
    
    From looking at the code, it seems that these decodes have just sort of
    been fixed piecemeal (likely when someone realized that pygit2 was
    failing to handle unicode properly, but any decodes which run on Python
    2 that don't specify utf-8 as the encoding are a ticking time bomb. I
    personally noticed this was a problem when I encountered a traceback in
    the RemoteCallbacks while fetching a new branch which contained utf-8
    characters. During the fetch, when `pygit2.remote.maybe_string()` was
    invoked by `_update_tips_cb()` with a pointer to a bytestring containing
    unicode, the decode fails because the default encoding is ascii. As it
    turns out, this was fixed in master, but there are a number which still
    have no explicit encoding.
    
    This commit explicitly uses utf-8 for all remaining bytestring decodes
    which do not have an encoding specified, aside from one in PY3-specific
    code where doing so would be redundant.
    terminalmage committed Feb 7, 2018
    Configuration menu
    Copy the full SHA
    6e71992 View commit details
    Browse the repository at this point in the history