-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose traversal-based car creation #269
Conversation
willscott
commented
Nov 22, 2021
•
edited
Loading
edited
- Complete the file-based alternative writes with no-precalculated size when a file is provided, and then goes pack to fix up the size of the data in the header at the end.
- include carv1 stream support
- Handle options being set on the traversal (repeated blocks)
- Handle options being set for the car (type of index, etc.)
- Roundtrip test
v2/selective.go
Outdated
"github.com/multiformats/go-varint" | ||
) | ||
|
||
// PrepareTraversal walks through the proposed dag traversal to learn it's total size in order to be able to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lotus evolved away from needing this, so I don't think there's a consumer for prepare+dump anymore, it's just write now, thankfully. There used to be a need to get the size up-front to set up the commp calculation with padding, but that's not necessary anymore. So you could decomplicate this by removing this functionality if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to write out to a stream - e.g. network, this 2x scan would still be more efficient than having to write it out to a file, touch up the header, and then as a second step send the whole thing, i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so this would be for a CARv2 format and you need the offset and all that? Because for a selector-based CARv1 you have the root already, or else you can't run the traversal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is to get the data size field correct in the carv2 header when streaming.
I can make a carv1-only stream version that can do it in one pass as well. I suppose that's probably useful as well.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
* Provide a method for writing a carv1 to a writer stream. * Return count of bytes written across various interfaces.
e6ef7e2
to
df2a186
Compare
I think i have all the code here doing what I want. starting in on tests. This would be a great time for reviews if you want to request changes to naming |
v2/internal/loader/writing_loader.go
Outdated
DecoderChooser: ls.DecoderChooser, | ||
HasherChooser: ls.HasherChooser, | ||
StorageWriteOpener: ls.StorageWriteOpener, | ||
StorageReadOpener: func(lc linking.LinkContext, l ipld.Link) (io.Reader, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm not gonna try to steer here, this is more a question for me to understand how to serve this area better in the future: Did you go through a decision process on whether to do this wrap StorageReadOpener
approach vs do some wrappers around the new storage.*
APIs?
Is this wrapping strategy just familiar already?
Is this wrapping strategy preferable because it seems like the most broadly applicable place to intercept things?
Is this wrapping strategy preferable because it's clear how it composes with other StorageReadOpener magic, such as how graphsync uses it to hook block load events?
Is it because figuring out how to wrap the storage.*
APIs is daunting because it would've required thinking about all the places feature detection might try kick in?
Other?
(These are not blocking questions; they just could be useful info for understanding the value of any future iterations on storage and link loading APIs and other possible event callbacks around them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's that the object that is expected to be passed around is a LinkSystem
- that linksystem will already have the storage.*
object applied to it and turned into a StorageReadOpener
, and in this library I won't know what the underlying readable storage was to be able to wrap it. If i assume the right thing to take in is a link system, which seems right because i need to care about things like 'should i hash reads', 'what reifiers are present', etc, then this is the place i can intercept block reads
v2/selective.go
Outdated
"github.com/multiformats/go-varint" | ||
) | ||
|
||
// PrepareTraversal walks through the proposed dag traversal to learn it's total size in order to be able to |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
return int64(n), err | ||
} | ||
|
||
h := NewHeader(tc.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud, in a separate PR we probably should change NewHeader
to take options and encapsulate the gymnastics needed to set correct header values.