Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage Content Path Affinity in routing #10251

Open
lidel opened this issue Dec 10, 2023 · 1 comment
Open

Leverage Content Path Affinity in routing #10251

lidel opened this issue Dec 10, 2023 · 1 comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/routing Topic routing topic/sharding Topic about Sharding (HAMT etc)

Comments

@lidel
Copy link
Member

lidel commented Dec 10, 2023

Created as a follow-up for #10249, #9416, and a prerequisite for #8676

Problem

Right now, we support three values in Reprovider.Strategy which tells reprovider what should be announced. Valid strategies are:

  • "all" - announce all stored data (this is also the implicit default)
  • "pinned" - only announce pinned data
  • "roots" - only announce directly pinned keys and root keys of recursive pins

If the repository gets too big, all and pinned are too expensive and folks are forced to use roots which is codec-agnostic and will only announce the root block of UnixFS DAG.

But roots comes with a big downside:

⚠️ BE CAREFUL: node with roots strategy will not announce child blocks.

It makes sense only for use cases where the entire DAG is fetched in full, and a graceful resume does not have to be guaranteed: the lack of child announcements means an interrupted retrieval won't be able to find providers for the missing block in the middle of a file, unless the peer happens to already be connected to a provider and ask for child CID over bitswap.

This is not an inherent limitation of IPFS as a whole – it is only a limitation of how things are implemented in Kubo:

  1. /ipfs/cid/sub/dir/file is resolved first, into /ipfs/file-CID
  2. Retrieval of /ipfs/file-CID starts
  3. If interrupted and resumed at later time, blocks for /ipfs/cid, /ipfs/cid/sub, and /ipfs/cid/sub/dir are already cached in local store, so Kubo does no network lookup for provider of these. It will ask for providers of the first missing block within /ipfs/file-CID, and if these internal blocks are not announced (e.g. due to Reprovider.Strategy set to roots), Kubo won't be able to resume download.

Solution ideas

  • Every block requested by Kubo has some Content Path Affinity
    • could be as simple as /ipfs/CID (direct block get)
    • or more complex, as /ipfs/cid/sub/dir/file (resuming retrieval from the middle of the file)
  • Pass this affinity information around
    • TBD: could be invasive change to all interfaces, or an optional hint passed in GO context
  • 👉 Make retrieval code leverage Content Path Affinity when regular providers can't be found
    • TBD: we want to balance speed vs avoiding bitswap spam.
      • If no providers for internal block within /ipfs/file-CID can be found, look for providers of parent entity (directory) CIDs (dir, sub and finally cid). With each step growing the probability of finding one. Or we could always ask for leas and the most distant ones in parallel. Depends if we expect majority of data being announced as roots or entities (Improved Reprovider.Strategy for entity DAGs (HAMT/UnixFS dirs, big files) #8676 (comment))
      • Ideas welcome, but also an implementation detail.
@lidel lidel added the kind/enhancement A net-new feature or improvement to an existing feature label Dec 10, 2023
@lidel lidel changed the title Leverage content path affinity in content routing Leverage Content Path Affinity in routing Dec 10, 2023
@lidel lidel added topic/routing Topic routing exp/expert Having worked on the specific codebase is important P2 Medium: Good to have, but can wait until someone steps up effort/weeks Estimated to take multiple weeks topic/sharding Topic about Sharding (HAMT etc) labels Dec 10, 2023
@lidel lidel moved this to 🥞 Todo in IPFS Shipyard Team Dec 10, 2023
@lidel lidel moved this to Todo in @lidel's IPFS wishlist Dec 11, 2023
@randalljyoung
Copy link

Does IPFS have only a push-centric functionality (provides) does there exist a pull equivalent ( requests?) and if so, what happens when you request a block that a node hasn't provided?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/routing Topic routing topic/sharding Topic about Sharding (HAMT etc)
Projects
No open projects
Status: 🥞 Todo
Development

No branches or pull requests

2 participants