-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use string instead of struct backing #47
Conversation
We call String all over the place so we should make sure it remains fast.
I like this. However, it's a massively breaking change so we'll have to be careful (I've received a lot of push-back from @whyrusleeping when trying to do this kind of thing in the past). Also, this makes extracting a multihash from a CID allocate (could be fixed by using a string internally) and means that constructing a CID involves copying bytes. We can fix the former by storing multihashes as strings (multiformats/go-multihash#29) and the latter by adding a function to multihash that writes to a writer (e.g., a string builder) which we'll probably want anyways. A less breaking-change version would be to use a pointer to a string (or a pointer to, e.g., Performance wise, implementing the varint skipping code manually may be faster. That is, to skip two varints, count two bytes with the high-order bit unset. Regardless, I think we should wait for the libp2p refactor + dht fixes + log fixes + everything else that is currently blocked. Personally, I'd like to just break things and fix this all at once but this really is a massively breaking change. All |
@dignifiedquire please familiarize yourself with the other attempts to do similar things, primarily: multiformats/go-multiaddr#62 |
@whyrusleeping I did read through that discussion, but didn't see that there was a final decision made on the direction to go, so this is based on a conversation I had with @Stebalien afterwards. I am happy to adjust if you and @Stebalien get to an agreement on which version exactly you want to go for in these, just let me know. |
// - hash mh.Multihash | ||
type Cid string | ||
|
||
var EmptyCid = Cid(string([]byte{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why string([]byte{})
and not just ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also make this a constant so we have:
const EmptyCid = Cid("")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also. I find EmptyCid
a bit tedious of a name. How out just Nil
"? When used outside this package it would be cid.Nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are just suggestions but I am blocking for two reasons (1) The KeyString()
method is wrong and needs to be fixed. (2) I opened a second p.r. (#71) where I implemented my suggestions.
func (c *Cid) KeyString() string { | ||
return string(c.Bytes()) | ||
func (c Cid) KeyString() string { | ||
return string(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. If the version is 0 only the Multihash should be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be right. If the version is 0, this will only be the multihash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Nvm, missed #47 (comment)
|
||
var EmptyCid = Cid(string([]byte{})) | ||
|
||
func (c Cid) version() uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should export this method, it can be useful.
hash: mhash, | ||
} | ||
func NewCidV0(mhash mh.Multihash) Cid { | ||
return newCid(0, DagProtobuf, mhash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this representation of CidV0 odd. Why not just use the normal binary representation, i.e, just the multihash?
// - hash mh.Multihash | ||
type Cid string | ||
|
||
var EmptyCid = Cid(string([]byte{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also. I find EmptyCid
a bit tedious of a name. How out just Nil
"? When used outside this package it would be cid.Nil
.
c.version == o.version && | ||
bytes.Equal(c.hash, o.hash) | ||
func (c Cid) Equals(o Cid) bool { | ||
// TODO: can we use regular string equality? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use regular string equality?
Yes.
// - version uvarint | ||
// - codec uvarint | ||
// - hash mh.Multihash | ||
type Cid string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make this type Cid struct{ string }
to ensure the Cid type is always valid.
} | ||
} | ||
|
||
// making sure we don't allocate when returning bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we do now and can't avoid it. I take it BenchmarkBytesV1
is leftover from the first attempt with used []byte as the representation.
@dignifiedquire for context, this is moving forward due to: #70, the fact that by-value CIDs will work better with refmt, and the fact that we're reworking a ton of CID stuff anyways for the base32 endeavor. |
@Stebalien thanks for the heads up, I was following the other thread. Let me know if you need anything from me, otherwise feel free to take the code and use what you can, and throw away what you don’t like :) |
Closing in favor of #71 |
cc @Stebalien
ref #38