-
-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent interface to get text and bytes #895
Comments
If I'm not missing anything.. These are the places where we use
These are the places where we use
With
|
Just curious, are you wanting to get all this done before doing another release? Also, how do you decide when to do a release? |
I prefer |
Good point. No, this doesn't block the release. What I'd like to do before the next release is handling the pycparser issue, #846 The criteria for new releases is: not too big, not too small (but it may be small if it's important). Looking back usually we've a release every 1 to 3 months. |
Given that Python 2 will be EOL on January 1 2020, do you want to consider making this a Python 3-only module? That would drastically simplify string & bytes handling. |
I'm an engineer on the Bitbucket team, and I just want to point out that we absolutely need an option to get raw bytes version of pretty much everything in the repo. It's not a question of Python2 vs. 3. pygit2's strategy of assuming UTF-8 and failing otherwise doesn't work for us all the time. We host millions of repos, and almost all of that repo data is created outside of Bitbucket and pushed to us. But we still need to display it correctly on our website. In the past, we've encountered issues in pygit2 where we simply have no way of reading a piece of data using pygit2 because it's not UTF-8 encoded, and pygit2 just raises an exception. |
@jnrbsn Okay, noted. |
@jdavid I know. I wasn't necessarily saying it was urgent. I just wanted to point out that dropping support for Python 2 doesn't mean we no longer need this. |
Follow up from #610 #790 and #893
General policy:
str()
.text
.text
use UTF-8 and replace (the rationale for replace is explained in Patch: Add __str__ and __bytes__ for undecoded content. #790 (comment)).data
or.raw
(this is to be decided)raw_
to get bytes. For instanceSignature.name
andSignature.raw_name
bytes(..)
where appropriateOpen for discussion.
TODO:
TreeEntry._name
by.raw_name
DiffLine.content
by.text
.data
or.raw
DiffLine.raw_content
by.data
or.raw
Object.read_raw()
by.data
(or.raw
), then removeBlob.data
(it will inherit fromObject
)str()
bytes()
and the buffer protocolThe case of Oid, what we've now:
oid.raw
returns the byte string (that's good, unless we decide to settle on.data
)str(oid)
andoid.hex
both return the hex representation, always<str>
(bytes in Python 2 and text in Python 3)str(...)
Object.hex
andTreeEntry.hex
behave the same, they return always<str>
. Apparently these are the only places where we always return<str>
.The text was updated successfully, but these errors were encountered: