-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove TraverseLinksOnlyOnce on piece CID #91
Conversation
This follows lotus's implementation: https://github.com/filecoin-project/lotus/blob/a843c52e38da13da489cbe6b290ea49b2660b3fb/node/impl/client/client.go#L1405-L1412 I have no idea if that would fix things tho.
Just adding here. Boost doesn't have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is backwards, DO NOT merge. It is lotus gencar that needs fixing
TraverseLinksOnlyOnce
is the "canonocal" way
- kubo does it: https://github.com/ipfs/kubo/blob/v0.14.0/core/commands/dag/export.go#L52
- lotus retrieval does it: https://github.com/filecoin-project/lotus/blob/v1.16.1/node/impl/client/client.go#L1036
cc @rvagg to confirm as he worked in this area recently
N.b. really using any type of traversal for filecoin storage is a mistake as it is way too fragile. Instead you need to use a dagstore to save the car-files as is and then transfer them on to an SP.
I totally agree, but isn't there a spec about how to generate piece CID from a DAG ?
Kubo doesn't care about filecoin's convention, it just want an efficient format.
I don't see why not merging if it's a good first aid thing. |
@Jorropo no. Filecoin stores streams of bytes. This is what PieceCid is computed against. E.g.:
☝️ I could make an off-network transfer deal with this and an SP will take it and prove it with zero problems. However, effectively all retrieval subsystems currently deployed on Filecoin expect by "gentleman's agreement" these opaque bytes to be a CarV1, with enumerable IPLD blocks and all that. So you store car files. How these files were constructed is unspecified, and there are multiple variations in the wild already ( not just the mismatch above ). Hence my comment about it being unworkably fragile. See also the end of the commit msg here: ribasushi/spade@1fb8a2f005cafc |
I know that, but AFAIK estuary's transfer protocol is graphsync which is not a stream of bytes but a stream of IPLD blocks. Isn't there a cannonical way to serialise an IPLD dag (such as got from a graphsync session) into a stream of bytes ? (and wouldn't that have the bad no That is my current understanding of the situation, pls tell me if I'm wrong. |
Yeah I get that you could just make something better, I'm arguing to use this bad option (if that fixes it) while a better solution is worked on. |
No, and there can't be a generic one ( think dags larger than 32g )
Kubo is what produces a large chunk of the stuff in the wild right now. However I think your actual problem is deeper - anyone switching from go-car-v1 to go-car-v2 will have this problem, since go-car-v1 always had this cc @hannahhoward for 🧠⛈️ |
Offtopic but, we want to make that multithreaded making them non deterministic btw. (not that matter if you are using a protocol that send opaque raw bytes instead of an IPLD dag). |
First, on It's true that without The algorithm for storing complete DAGs is pretty straightforward and when followed e2e should (will) result in a stable CommP, but we have an explosion of tooling doing this and a lot of variety in what we are storing, and as @ribasushi is alluding to, we have an increasing number of clients wanting to just throw arbitrary CARs (or arbitrary bytes) into Filecoin. There is a philosophical question about whether Filecoin should be block-device or a DAG-based block-store—that seems to define a large amount of disagreements in this deal-making area; I'm not entirely sure why it generates so much heat except that the "just store my bytes" people are frustrated and impatient by the fact that Lotus @ launch has been complete-DAG focused and we've been very slow to make space for the "I don't want you to re-walk my DAG" and "just store my bytes" people (offline deals + bidbot helped, now Boost HTTP is really completing that too I believe). When we have people storing non-DAGs, or just payloads with bundles of blocks that they want to be able to get out in that order, then we get instability in CommP because we can't apply our stable traversal rules to it. When doing a classic "I have a DAG, here's the root and access to my blockstore, go store it" type of deal then it's all cool, we can do that with BUT we need to be very clear that this isn't the always-case, there are a lot of deals being done on arbitrary CARs or just arbitrary bytes and there's a good case for this and we should be aware of it and allow for it. If Estuary sticks with the plain IPFS DAG model ("here's my DAG and here's its root") then it can stick to the |
This follows lotus's implementation:
https://github.com/filecoin-project/lotus/blob/a843c52e38da13da489cbe6b290ea49b2660b3fb/node/impl/client/client.go#L1405-L1412
I have no idea if that would fix things tho. (need tests)