-
Notifications
You must be signed in to change notification settings - Fork 53
feat: hamt enumlinks custom #111
feat: hamt enumlinks custom #111
Conversation
Thank you for submitting this PR!
Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. |
e4d2cd8
to
c930522
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parallelWalkDepth
LGTM (modulo the errgroup
usage I'm not familiar with), left some minor comments but nothing blocking.
Need more time to understand the changes in the testing logic of the last commit.
hamt/hamt.go
Outdated
var linksToVisit []cid.Cid | ||
for _, nextLink := range shardOrCID.links { | ||
var shouldVisit bool | ||
|
||
visitlk.Lock() | ||
shouldVisit = visit(nextLink) | ||
visitlk.Unlock() | ||
|
||
if shouldVisit { | ||
linksToVisit = append(linksToVisit, nextLink) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we could drop this optimization (to simplify the code) as I wouldn't expect to have repeated internal (non-value) shard nodes.
hamt/hamt.go
Outdated
return err | ||
} | ||
|
||
nextLinks, err := nextShard.walkLinks(processShardValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The general nomenclature around listCidShardUnion
is a bit confusing. My general take is that we are processing all the children from the same node at once, together, both in-memory Shard
s or stored links, but there are some conflicting names:
- The
walkLinks
function actually walks all children, both links and shards. (Similar thenextLinks
in this line.) - The 'union' suffix in the structure's name makes me think we have an 'either of' situation.
- Similar the
shardOrCID
name somewhere above in this function. Internally the HAMT stores each of its children in either of shard or link format. In that sense the 'union/or' terms are correct, but when processing all the children from a single node I think we should decouple ourselves from that mutually exclusive definition and focus on a single group of children (that yes, will be expressed in either of those two formats but doesn't seem key to the Walk algorithm).
hamt/hamt.go
Outdated
grp.Go(func() error { | ||
for shardOrCID := range feed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The walk algorithm here is more expansive than the original and its differences should be documented (as this is clearly a copy of the other and anyone reading this code will be thinking of the original when trying to reason through it). (Not referring to the GetMany
optimization which is valid in itself and could even be incorporated to the Shard
logic.)
In the original we process one parent node at a time (represented by its CID), extract its children, filter which should be emitted as output (value link/shards), and push the rest to the queue/feed one at a time to be processed independently in the next iteration, each as new parent node.
Here we send (after filtering) all the children together as a bulk (lists in listCidShardUnion
) and extract all their children in turn together. (It might be a valid optimization and this comment is not against it, just advocating for more documentation around it). I'm not sure if this affects the traversal behavior expected by TestHAMTEnumerationWhenComputingSize
; I don't think so but need more time to think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edit: it was affecting tests, see comment below)
The grouping caused by the the |
Follow-up: splitting the children and processing them separately in the loop fixes the test (see PoC branch). |
Correct. That's the same as using |
Yes, optimizing for speed is in conflict with adding delays to have a predictable BFS. That is expected, we just need to change our expectation of predictability in the test. |
Also note that the |
Does it necessarily do that? If we cancel the context then |
You're right. The final aim of the optimization is reducing enumeration time, not number of fetches. (Still our test tracks the second.) |
fix comments in completehamt_test.go
An alternative to #110. Would still like to clean this up with errgroups if possible.
I've gone through three iterations of this function through this PR (follow the commits):
(CID, Shard)
as the elements we can walk over so we can do it in memory 49314cf@Stebalien recommended that last improvement as an efficiency thing lower down the stack since the Bitswap requests will get grouped. However, I'm not sure this is really the correct implementation since it'll allow slow resolution of some parts of the HAMT to gum up the system.
This does seem to impact walk order considerably causing the number of nodes traversed in the HAMT size estimation using a complete HAMT to drop dramatically. While this seems like a good thing, it means we're really messing with the ordering here and may want to be careful and go with option 2.
cc @schomatis @Stebalien