-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't abort block processing when encountering SkipMe #251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that looks right
@warpfork I'm cringing at how much this is going to grind your gears, but I've pushed a couple more commits, including tests for The fixtures from focus_test.go use blocks that have been There's a few ways to deal with this obviously, the one I went for was:
Another option was to just go with dag-json and rewrite the order dependent assertions and add some docs. I can back up and do that if you think that's preferable—the custom encoder is already pretty weird so maybe it'll be less weird to just fix up the ordering and explain why they are not in the original construction order. Added walk tests for a few different selectors, including the manually constructed one used commonly in Filecoin and a few other places. |
Yeah, this does grind my gears. Sorry. Can't bring myself to beat around the bush on this one. We should not delete tests that test that order is maintained in the data model. The reasoning for this is straightforward:
This means I don't think we should do the encoding round-tripping, either. That would effectively mean we're not exercising some otherwise very-reachable behavior, and that would be a mistake. (This is a problem we could've continued to just not have. Ya'll wanted to walk face-first into it... ┐_(ツ)_┌ This is the kind of stuff we have to deal with, all the dang time now, if dag-json is going to be incomplete on ordering grounds. We'll probably eventually want to... make another codec that doesn't have that problem, to avoid the sheer irritation of it all, I suspect. Meanwhile, we also have to support all this stuff unambiguously if only for the sheer reason we also still decode un-sorted dag-json data in the wild without complaint. The worst of all worlds! Who could've possibly seen this coming!) |
k, I've reverted the special codec and encode round-trip, leaving testing of a non-encoded node intact. I accept your disgruntlement with the codec shuffling, but this is also the world of codecs we swim in, it's going to be the case with all current codecs we deal with, especially the ones with wide real-world use - dag-pb, git, bitcoin, eth, etc. They all have special rules about how the bytes should be laid out, some stricter than others. So creating One of the things I experimented with in JS, for dag-pb at least, was a dag-jose has a really weird version of this where it'll accept a "compact" form which is just a single colon-delimited string that can be expanded out to be the full form. The codec takes both this compact and the full form but only encodes the full form. So a round-trip can get really surprising in that case. This is the world of messy data and opinionated layers. If you want a codec that preserves order then we should just do that. Or, just lean on lists to make the problem disappear entirely, |
a8b79b8
to
7c796f2
Compare
Includes a working version of #252 with tests for it doubling what I already have for |
Loading a link from within a block and getting a SkipMe back from the loader will `return` out of the node iteration. If using SkipMe to deal with block load memoisation, this can lead to incomplete traversal by avoiding any potential non-skippable links that follow a skipped one.
dag-json has lexical sorting of map fields, so encode() here will be reordering some of the fields. Since we are doing a traversal from rootNode for most tests and it doesn't get loaded (since it's the start Node), it doesn't get the same reordering treatment as the other blocks. So, we round-trip all blocks through the codec to ensure that the Node that we have matches the Link and is what it would be regardless of where it comes from. Additionally, we're using a custom codec here for the tests that _remove_ the map sorting and leaves the blocks unmolested, so we don't have to rewrite all of the parts of the tests where ordering matters and the fields match up more nicely with the fixtures we've constructed. For the chain test, the ordering is now different because it depended on dag-json reordering, so instead of hard-wiring expectations in a switch() we do it in an array of expected paths.
As per review, remove the special re-encode and codec for test fixtures so we have test coverage of the case where we're traversing a node that's not been through a codec that messes with field ordering.
Normally, it's possible to visit the same blocks of data repeatedly during a traversal -- which, if you look at the world with sufficiently strict definitions, "makes sense", because traversals are really visiting `(path, node)` tuples, not just the nodes themselves. But often, we find people really do only care about visiting nodes and don't particularly care to have the traversal inform them if it reaches that same data again by a different path. With this patch, we track where we've been, at the granularity of links. If we've encountered a link before, we won't load it again. There's a configuration option for this, because in rare occasions, I could imagine not wanting this dedup; also, if one knows one doesn't need it, it may be valuable to turn it off because it does of course consume a small amount of memory to track where we've been.
…ather than defaulting to on. This makes it a much less breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests. ✔️
Loading a link from within a block and getting a SkipMe back from the loader
will
return
out of the node iteration. If using SkipMe to deal with blockload memoisation, this can lead to incomplete traversal by avoiding any
potential non-skippable links that follow a skipped one.
(still needs a test or two)