add fr32-sha2-256-trunc254-padded-binary-tree multihash #331

aschmahmann · 2023-07-26T18:53:50Z

This PR reserves a single multihash code. The request here comes from the Filecoin ecosystem which is already using the underlying hashes here in relation to the codes reserved in #170 and #172.

However, it turns out the approach taken in the reservation of those codes was insufficient and as a result there’s a new spec proposal for a different multihash to better represent the data being referred to.

At a high level there were three approaches for describing these trees given here. We went with option 2, but that turns out to have been a mistake so now we’re going with option 3.

Why is this a good idea when you already had a code here?

You live and you learn. It turns out that for the particular data being dealt with the approach taken originally was problematic (and is IIUC largely why the related codec entries were tagged as filecoin rather than IPLD) and this one makes life easier. The reason CID and multihash exist is to make it easier to evolve through mistakes, this is one example of that.

Should other applications of merkle-tree hashes lean on custom multihashes rather than reusing existing ones?

As with my comment around IPLD codecs from a while ago I think the answer lies in what the application gain/loses out on by not reusing existing multihashes.

Is it reasonable to have multiple codes dealing with effectively the same data?

Situationally, yes. In this case it turns out there are better and worse ways to describe the same data and people might even disagree on what those are. The table allows us to support these options.

A related example I’ve heard mentioned in the past was if someone wanted to register codes for the internal components of the blake3 tree so that data could be referred to as either a Blake3 multihash of a 1GiB file, or a more specific Blake3 internal-node multihash for the root of the tree whose leaves are the 1GiB file. On the surface this seems like a reasonable request as well.

cc @ribasushi @willscott @rvagg @vmx

(p.s. I hope Rod and Volker forgive me for making them relive #161, #170 and #172)

vmx · 2023-07-27T09:17:01Z

From the FRC abstract:

it turns out to be much more useful to be able to express the tuple of (Root hash, size of tree) than just the root hash.

This is how I think it should be done. The size doesn't really matter for the hash. The size information is additional context that is needed for some application. To me, this is not what Multihash is for.

I've discussed with a lot of people about how to transmit additional context with CIDs at the LabWeek/IPFSCamp in 2022. I think there should be a way, but I still believe it shouldn't be part of the CID, but something separate.

aschmahmann · 2023-07-27T14:56:09Z

This is how I think it should be done. The size doesn't really matter for the hash. The size information is additional context that is needed for some application. To me, this is not what Multihash is for.

Yes/No. The situation as it currently stands makes referring to the data needed by CID not particularly useful. It was acknowledged that the v1 Piece CIDs are not a real CIDs which is why the codecs are labeled as filecoin rather than ipld (#161 (comment) and #172 (comment)). This seems to be because (as mentioned in those issues) it wasn't clear why/how people were going to use these CIDs or if they'd be used outside of Filecoin internals. However, it turns out people now want to move around this data by CID and the current multihash + codec combination doesn't do that.

Fixing mistakes is IMO precisely what multiformats is for and realizing that Filecoin Pieces might need real Content IDentifiers (CIDs) so they can be used in other contexts (e.g. IPFS p2p transfer of data by piece CID) and that a new multihash is necessary to do so is the cost of learning (and the benefit of having multiformats).

vmx · 2023-08-02T09:15:34Z

I talked with various people about it, I now have a better understanding.

I've only thought about the producing side of things. You put in some data and you get a specific Merkle tree out. Though if you think about CIDs also as a way to retrieve the data, then the height matters. You would retrieve data and would want to verify that the data is the one you expected. Without the height, you could get a higher level of the Merkle tree (random bytes and not the data you expected) and it would still verify correctly.

This means that it now makes sense to me to include the height.

rvagg · 2023-08-04T03:51:19Z

Should other applications of merkle-tree hashes lean on custom multihashes rather than reusing existing ones?

From an IPLD perspective (i.e. if these are multihashes intended for CIDs) then I think the answer would come down to what we are addressing. The original "bmt" entry we had, which I think was even thought to be useful for bitcoin binary merkle addressing, turns out to be not so useful because you're addressing the entire base set of data, which is not really what you typically want. From the IPLD perspective you typically want to navigate through links to individual pieces of these things.

For CommP, if we want it to address the entire base data being "hashed", then viewing the merkle process as a hash function itself makes more sense.

When we start doing things like inclusion proofs then it gets a bit more iffy. Does https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md muddy these waters a bit? Does data-segment neatly fold up into this new conception of CommP? Or do we keep those concerns entirely separate from this?

Also: you're going to have to work on table formatting (see CI) .. sorry but your long name is going to make that tricky, or you might have to shorten it.

rvagg · 2023-08-04T03:51:59Z

And broadly this is +1 from me, just with some general questions about data-segment, and also the need to update the table formatting.

refs multiformats/multicodec#331

alanshaw · 2024-04-26T15:36:29Z

Can be merged in draft at least? We've been using this in web3.storage for a long time now.

rvagg · 2024-04-29T02:18:58Z

can be merged when it's mergeable; table formatting needs fixing up at least

ribasushi · 2024-05-09T07:07:53Z

table.csv

@@ -146,6 +146,7 @@ transport-bitswap,              transport,      0x0900,         draft,      Bits
 transport-graphsync-filecoinv1, transport,      0x0910,         draft,      Filecoin graphsync datatransfer
 transport-ipfs-gateway-http,    transport,      0x0920,         draft,      HTTP IPFS Gateway trustless datatransfer
 multidid,                       multiformat,    0x0d1d,         draft,      Compact encoding for Decentralized Identifers
+fr32-sha2-256-trunc254-padded-binary-tree, multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments


Suggested change

fr32-sha2-256-trunc254-padded-binary-tree, multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments

fr32-sha2-256-trunc254-padbintree,multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments

Suggestion for shortened name as per https://github.com/filecoin-project/FIPs/pull/808/files#r1361196071 . The huge realignment needed is an implicit flag that the proposed name is obnoxiously long.

Accepting above suggestion fixes the alignment requested by @rvagg as a side effect.

does that pass the validator script? I thought it checked for spaces - but also happy to have the script edited to make accommodations for particularly large columns if someone wants to come up with an algorithm as I don't really like the idea of shuffling the whole table for this

some additional shortening options: sha2-256 -> sha256, trunc254 -> trunc

add fr32-sha2-256-trunc254-padded-binary-tree mutlihash

30f0e7d

aschmahmann requested review from rvagg and vmx as code owners July 26, 2023 18:53

alanshaw mentioned this pull request Sep 25, 2023

feat: add fr32-sha2-256-trunc254-padded-binary-tree codec multiformats/cid-utils-website#61

Merged

alanshaw pushed a commit to multiformats/cid-utils-website that referenced this pull request Sep 25, 2023

feat: add fr32-sha2-256-trunc254-padded-binary-tree codec (#61)

8022e51

refs multiformats/multicodec#331

ribasushi reviewed May 9, 2024

View reviewed changes

ribasushi mentioned this pull request Oct 21, 2024

feat: v2 commp cid filecoin-project/go-fil-commcid#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fr32-sha2-256-trunc254-padded-binary-tree multihash #331

add fr32-sha2-256-trunc254-padded-binary-tree multihash #331

aschmahmann commented Jul 26, 2023

vmx commented Jul 27, 2023

aschmahmann commented Jul 27, 2023

vmx commented Aug 2, 2023

rvagg commented Aug 4, 2023

rvagg commented Aug 4, 2023

alanshaw commented Apr 26, 2024

rvagg commented Apr 29, 2024

ribasushi May 9, 2024

rvagg May 9, 2024

rvagg May 9, 2024

	fr32-sha2-256-trunc254-padded-binary-tree, multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments
	fr32-sha2-256-trunc254-padbintree,multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments

add fr32-sha2-256-trunc254-padded-binary-tree multihash #331

Are you sure you want to change the base?

add fr32-sha2-256-trunc254-padded-binary-tree multihash #331

Conversation

aschmahmann commented Jul 26, 2023

Why is this a good idea when you already had a code here?

Should other applications of merkle-tree hashes lean on custom multihashes rather than reusing existing ones?

Is it reasonable to have multiple codes dealing with effectively the same data?

vmx commented Jul 27, 2023

aschmahmann commented Jul 27, 2023

vmx commented Aug 2, 2023

rvagg commented Aug 4, 2023

rvagg commented Aug 4, 2023

alanshaw commented Apr 26, 2024

rvagg commented Apr 29, 2024

ribasushi May 9, 2024

Choose a reason for hiding this comment

rvagg May 9, 2024

Choose a reason for hiding this comment

rvagg May 9, 2024

Choose a reason for hiding this comment