Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.

feat: hamt enumlinks custom #111

Merged
merged 10 commits into from
Nov 12, 2021

Conversation

aschmahmann
Copy link
Contributor

@aschmahmann aschmahmann commented Oct 27, 2021

An alternative to #110. Would still like to clean this up with errgroups if possible.

I've gone through three iterations of this function through this PR (follow the commits):

  1. Basically a clone of go-merkledag's walker but using a union of (CID, Shard) as the elements we can walk over so we can do it in memory 49314cf
  2. Try to clean up the above to use error groups to make the code easier to follow and manage c930522
  3. Switch to using bulkier dagservice requests (i.e. GetMany instead of Get) 8051de7 (the latest also includes some test modifications that were required to get this to work)

@Stebalien recommended that last improvement as an efficiency thing lower down the stack since the Bitswap requests will get grouped. However, I'm not sure this is really the correct implementation since it'll allow slow resolution of some parts of the HAMT to gum up the system.

This does seem to impact walk order considerably causing the number of nodes traversed in the HAMT size estimation using a complete HAMT to drop dramatically. While this seems like a good thing, it means we're really messing with the ordering here and may want to be careful and go with option 2.

cc @schomatis @Stebalien

@welcome
Copy link

welcome bot commented Oct 27, 2021

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@aschmahmann aschmahmann changed the base branch from master to schomatis/directory/unsharding October 27, 2021 18:43
Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallelWalkDepth LGTM (modulo the errgroup usage I'm not familiar with), left some minor comments but nothing blocking.

Need more time to understand the changes in the testing logic of the last commit.

hamt/hamt.go Outdated
Comment on lines 472 to 483
var linksToVisit []cid.Cid
for _, nextLink := range shardOrCID.links {
var shouldVisit bool

visitlk.Lock()
shouldVisit = visit(nextLink)
visitlk.Unlock()

if shouldVisit {
linksToVisit = append(linksToVisit, nextLink)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we could drop this optimization (to simplify the code) as I wouldn't expect to have repeated internal (non-value) shard nodes.

hamt/hamt.go Outdated
return err
}

nextLinks, err := nextShard.walkLinks(processShardValues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The general nomenclature around listCidShardUnion is a bit confusing. My general take is that we are processing all the children from the same node at once, together, both in-memory Shards or stored links, but there are some conflicting names:

  • The walkLinks function actually walks all children, both links and shards. (Similar the nextLinks in this line.)
  • The 'union' suffix in the structure's name makes me think we have an 'either of' situation.
  • Similar the shardOrCID name somewhere above in this function. Internally the HAMT stores each of its children in either of shard or link format. In that sense the 'union/or' terms are correct, but when processing all the children from a single node I think we should decouple ourselves from that mutually exclusive definition and focus on a single group of children (that yes, will be expressed in either of those two formats but doesn't seem key to the Walk algorithm).

hamt/hamt.go Outdated
Comment on lines 457 to 458
grp.Go(func() error {
for shardOrCID := range feed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The walk algorithm here is more expansive than the original and its differences should be documented (as this is clearly a copy of the other and anyone reading this code will be thinking of the original when trying to reason through it). (Not referring to the GetMany optimization which is valid in itself and could even be incorporated to the Shard logic.)

In the original we process one parent node at a time (represented by its CID), extract its children, filter which should be emitted as output (value link/shards), and push the rest to the queue/feed one at a time to be processed independently in the next iteration, each as new parent node.

Here we send (after filtering) all the children together as a bulk (lists in listCidShardUnion) and extract all their children in turn together. (It might be a valid optimization and this comment is not against it, just advocating for more documentation around it). I'm not sure if this affects the traversal behavior expected by TestHAMTEnumerationWhenComputingSize; I don't think so but need more time to think about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edit: it was affecting tests, see comment below)

@schomatis
Copy link
Contributor

The grouping caused by the the GetMany requests might cause the fake timeout added in countGetsDS to be ineffective in keeping an ordered BFS (some threads win over others turning it more into a DFS). Reducing the concurrency in parallelWalkDepth indeed hits the mark expected by TestHAMTEnumerationWhenComputingSize passing the test. (We might want to add the concurrency parameter to the internal package to adjust it for this test. It would lose the true parallel dimension but would still be useful in testing the brand new parallel walk.)

@schomatis
Copy link
Contributor

Follow-up: splitting the children and processing them separately in the loop fixes the test (see PoC branch).

@aschmahmann
Copy link
Contributor Author

Follow-up: splitting the children and processing them separately in the loop fixes the test

Correct. That's the same as using Get instead of GetMany though.

@schomatis
Copy link
Contributor

Yes, optimizing for speed is in conflict with adding delays to have a predictable BFS. That is expected, we just need to change our expectation of predictability in the test.

@schomatis
Copy link
Contributor

Also note that the GetMany optimization preemptively fetches more links than what it might need so it goes directly against the original optimization (which TestHAMTEnumerationWhenComputingSize was explicitly testing) of fetching as little nodes as possible to determine HAMT directory size.

@aschmahmann
Copy link
Contributor Author

Also note that the GetMany optimization preemptively fetches more links than what it might need

Does it necessarily do that? If we cancel the context then GetMany should terminate early so we're asking for (and might receive) more blocks than we need, but we're not necessarily waiting on them and might not receive the extra blocks at all.

@schomatis
Copy link
Contributor

You're right. The final aim of the optimization is reducing enumeration time, not number of fetches. (Still our test tracks the second.)

@aschmahmann aschmahmann merged commit 20d951f into schomatis/directory/unsharding Nov 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants