Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extract TraversalResumerPathState interface and allow it to be shared across traversals #327

Draft
wants to merge 3 commits into
base: resumecarwrite
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Aug 11, 2022

I think if we allowed for a stored/shared TraversalResumerPathState (interface for pathState) then we should be able to resume across any selective CAR that has the same selector & DAG.

For the purpose of Boost, we'd make one of these things, then share it across any CAR creation that happens, accumulating state and allowing the Seek() to move forward in the CAR, skipping over blocks that don't interfere the traversal and that don't need.

The catch for Boost is that it needs to be thread safe, so we'd need to wrap it all in a mutex, but we can do that over there.

@rvagg rvagg requested a review from willscott August 11, 2022 08:56
v2/traversal/resumption.go Outdated Show resolved Hide resolved
v2/traversal/resumption.go Outdated Show resolved Hide resolved
@rvagg
Copy link
Member Author

rvagg commented Aug 11, 2022

This is probably not enough, the *RewindToX() methods on the TraverserResumer aren't even used anywhere outside of tests yet. Something about the Seek operation needs to match up between the TraverserResumer and SkipWriterReaderSeeker. The latter receives a Seek() and sets an offset which helps it ignore blocks, but I don't see the traversalCar#setup() where we make a new TraverserResumer. I think we probably need the offset to be set all the way down into there so we're not even loading blocks.

@rvagg rvagg force-pushed the resumecarwrite branch 2 times, most recently from de951ac to e9c7d8a Compare August 12, 2022 05:58
@rvagg rvagg force-pushed the rvagg/shared-pathstate branch from 3592539 to d2e5313 Compare August 12, 2022 06:24
@rvagg rvagg marked this pull request as draft August 12, 2022 11:40
@rvagg
Copy link
Member Author

rvagg commented Aug 12, 2022

A whole bunch of work in the latest commit, wiring up the resumer to help with the skipping. Not quite where I think it needs to go yet, I don't think the skip reader is quite doing the right thing yet and I've passed down the path cache into it but am not using it yet—I thought I'd need to use it to get offset information and the fact that I haven't had to yet suggests this still isn't working like I want.

The main thing I want, other than this working properly, is to be able to entirely skip block loads when we're resuming part way through a CAR that we have information about. Loading blocks but not spitting them out is a step in the right direction, but it'd be great if we didn't have to load them at all. In theory this should be very close, but I have no e2e tests to assert that it's actually working, and I'm suspicious about the lack of offset information needed at the reader .. so we'll see.

Having spent far too much time on this I'm wondering whether it's really worth it. For now the narrow usecase I have is for an explore-all selector, which is a very simple case, but the masses of complicated and hard to understand (and maintain) code in here is to support arbitrary selectors (in theory). Is it worth it? Can we really have assurance that it's going to work for those cases? I'm doubtful on both counts.

Will consider this more over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants