Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when looking at the tagged_date #943

Closed
cydan33 opened this issue Oct 16, 2019 · 4 comments
Closed

UnicodeDecodeError when looking at the tagged_date #943

cydan33 opened this issue Oct 16, 2019 · 4 comments

Comments

@cydan33
Copy link

cydan33 commented Oct 16, 2019

I can't get the tagged date for some repository tags.

For example -

import git
r = git.Repo.clone_from("https://github.com/nodejs/node", "/tmp/nodegit")
t = r.tags[10]
t.object.tagged_date

Error -

~/.local/lib/python3.7/site-packages/gitdb/util.py in __getattr__(self, attr)
    251         to be created and set. Next time the same attribute is reqeusted, it is simply
    252         returned from our dict/slots. """
--> 253         self._set_cache_(attr)
    254         # will raise in case the cache was not created
    255         return object.__getattribute__(self, attr)

/tmp/GitPython/git/objects/tag.py in _set_cache_(self, attr)
     51         if attr in TagObject.__slots__:
     52             ostream = self.repo.odb.stream(self.binsha)
---> 53             lines = ostream.read().decode(defenc).splitlines()
     54
     55             _obj, hexsha = lines[0].split(" ")

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 392: invalid continuation byte

It happens for the following repositories -
https://github.com/nodejs/node (tag "v0.1.0")
https://github.com/OpenVPN/openvpn (tag "v2.3.1")

It reproduces in the latest version.

@Byron
Copy link
Member

Byron commented Oct 18, 2019

Thanks for posting. and for providing the information needed for reproduction! This helps a lot.

Unfortunately, I don't see a way to fix it as UTF-8 is assumed as default encoding. It can't be overridden unless the value of sys.getdefaultencoding() is different on python startup.

A preferable solution would be to not assume any encoding and instead work on bytes.

@Harmon758
Copy link
Member

I'm unable to reproduce this now.
Is this still an issue? If so, can you please provide details about your environment?

@3droj7
Copy link

3droj7 commented Feb 20, 2020

Yeah it still happens, just not in the 10th tag but the 7th

import git
r = git.Repo.clone_from("https://github.com/nodejs/node", "/tmp/nodegit")
t = r.tags[7]
t.object.tagged_date
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-c80fb16d948c> in <module>
----> 1 t.object.tagged_date

/usr/local/lib/python3.7/dist-packages/gitdb/util.py in __getattr__(self, attr)
    251         to be created and set. Next time the same attribute is reqeusted, it is simply
    252         returned from our dict/slots. """
--> 253         self._set_cache_(attr)
    254         # will raise in case the cache was not created
    255         return object.__getattribute__(self, attr)

/usr/local/lib/python3.7/dist-packages/git/objects/tag.py in _set_cache_(self, attr)
     51         if attr in TagObject.__slots__:
     52             ostream = self.repo.odb.stream(self.binsha)
---> 53             lines = ostream.read().decode(defenc).splitlines()
     54
     55             _obj, hexsha = lines[0].split(" ")

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 392: invalid continuation byte

@Harmon758
Copy link
Member

I can't reproduce this with https://github.com/OpenVPN/openvpn/releases/tag/v2.3.1:

>>> import git
>>> r = git.Repo.clone_from("https://github.com/OpenVPN/openvpn", "/tmp/openvpn")
>>> r.tags[40].object.tag
'v2.3.1'
>>> r.tags[40].object.tagged_date
1364406784

However, the annotation for https://github.com/nodejs/node/releases/tag/v0.1.0 seems to be encoded in Latin-1 (ISO-8859-1), without having an encoding header for the tag specifying that.

Even git show fails to display the tag message properly, as it also assumes UTF-8:

image

Although GitPython uses the filesystem's encoding rather than looking at the encoding header, that's a separate issue.

wip-sync referenced this issue in NetBSD/pkgsrc-wip Mar 7, 2020
3.1.0
=====

* Switched back to using gitdb package as requirement
  (`gitdb#59 <https://github.com/gitpython-developers/gitdb/issues/59>`_)

3.0.9
=====

* Restricted GitDB (gitdb2) version requirement to < 4
* Removed old nose library from test requirements

Bugfixes
--------

* Changed to use UTF-8 instead of default encoding when getting information about a symbolic reference
  (`#774 <https://github.com/gitpython-developers/GitPython/issues/774>`_)
* Fixed decoding of tag object message so as to replace invalid bytes
  (`#943 <https://github.com/gitpython-developers/GitPython/issues/943>`_)

3.0.8
=====

* Added support for Python 3.8
* Bumped GitDB (gitdb2) version requirement to > 3

Bugfixes
--------

* Fixed Repo.__repr__ when subclassed
  (`#968 <https://github.com/gitpython-developers/GitPython/pull/968>`_)
* Removed compatibility shims for Python < 3.4 and old mock library
* Replaced usage of deprecated unittest aliases and Logger.warn
* Removed old, no longer used assert methods
* Replaced usage of nose assert methods with unittest

3.0.7
=====

Properly signed re-release of v3.0.6 with new signature
(See `#980 <https://github.com/gitpython-developers/GitPython/issues/980>`_)

3.0.6
=====

| Note: There was an issue that caused this version to be released to PyPI without a signature
| See the changelog for v3.0.7 and `#980 <https://github.com/gitpython-developers/GitPython/issues/980>`_

Bugfixes
--------

* Fixed warning for usage of environment variables for paths containing ``$`` or ``%``
  (`#832 <https://github.com/gitpython-developers/GitPython/issues/832>`_,
  `#961 <https://github.com/gitpython-developers/GitPython/pull/961>`_)
* Added support for parsing Git internal date format (@<unix timestamp> <timezone offset>)
  (`#965 <https://github.com/gitpython-developers/GitPython/pull/965>`_)
* Removed Python 2 and < 3.3 compatibility shims
  (`#979 <https://github.com/gitpython-developers/GitPython/pull/965>`_)
* Fixed GitDB (gitdb2) requirement version specifier formatting in requirements.txt
  (`#979 <https://github.com/gitpython-developers/GitPython/pull/965>`_)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants