Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinning/pinner: Pinner should not be tasked to fetch blocks #378

Open
MichaelMure opened this issue Feb 3, 2023 · 5 comments
Open

pinning/pinner: Pinner should not be tasked to fetch blocks #378

MichaelMure opened this issue Feb 3, 2023 · 5 comments
Labels
kind/architecture Core architecture of project need/analysis Needs further analysis before proceeding

Comments

@MichaelMure
Copy link
Contributor

Having worked on the Pinner in various way (including alternate implementation), it occurred to me that it really shouldn't be tasked to:

  1. write the root block into storage in Pin(): https://github.com/ipfs/go-ipfs-pinner/blob/72f5e02e73db5e05ef0a140a7938cbc89dfc38b0/dspinner/pin.go#L177
  2. fetch the full DAG in Pin(): https://github.com/ipfs/go-ipfs-pinner/blob/72f5e02e73db5e05ef0a140a7938cbc89dfc38b0/dspinner/pin.go#L202

This is for different (not obvious, I admit) reasons:

  • often, the caller side know if the blocks are local or not. In any case, it's trivial to fetch the blocks on the caller side if not.
  • that would allow to merge Pin and PinWithMode and simplify the API
  • this hurts composability in weird scenario like mine, where you want to control where that fetching is happening, and where the blocks are going (aka, not necessarily in the normal DAGService)
  • it might avoid double writes
  • it seems like this pain is already felt within kubo: https://github.com/ipfs/kubo/blob/82467bc936fe24355c930f7226fce7e41fee591c/core/commands/dag/import.go#L73-L87

Admittedly, things have worked that way for a long time, but fixing that might help in the long run. Looking at where Pin is called in kubo, it also looks like a non-invasive change.

@MichaelMure MichaelMure added the need/triage Needs initial labeling and prioritization label Feb 3, 2023
@MichaelMure
Copy link
Contributor Author

On an intuition level, this has a similar vibe as having the caller side of bitswap adding blocks in storage (ipfs/go-bitswap#571), which helped splitting client and server side, among other things.

@lidel
Copy link
Member

lidel commented Feb 6, 2023

cc @Jorropo for visibility, I remember you were suggesting refactoring the way we do fetch during pinning

@lidel lidel added need/analysis Needs further analysis before proceeding and removed need/triage Needs initial labeling and prioritization labels Feb 6, 2023
@Jorropo Jorropo self-assigned this Feb 14, 2023
@Jorropo
Copy link
Contributor

Jorropo commented Feb 21, 2023

The pinner keeps track of long term state in it's internal pin database, it also keep track of state in the blockstore, it would be VERY weird if it kept track of some state but not all of it (if we still had the internal DB but not the blockstore part of it).

Currently it does not even store the blocks directly, it relies on side effect of the blockservice (blockservice stores all block downloaded).

Also unlike ipfs/go-bitswap#571 where there was a bug where blocks were local puts were duplicated (once in blockservice and once in bitswap), I am not aware of any bug here, it's just very side effect heavy code that hard to use without the monolith architecture Kubo was built on.

Maybe there is something better elsewhere but this seems like a complete refactor of the pinner which I don't want to take on right now.
I am not convainced that the two usecases here (pinning per requests and pinning on a monolith) are similar enough to warant sharing a package. Except than a traversal logic (which can be abstracted in a new or already existing third package) I don't see what they would have in commun.

@Jorropo Jorropo added the kind/architecture Core architecture of project label Feb 21, 2023
@MichaelMure
Copy link
Contributor Author

The pinner keeps track of long term state in it's internal pin database, it also keep track of state in the blockstore, it would be VERY weird if it kept track of some state but not all of it (if we still had the internal DB but not the blockstore part of it).

Yet that's what is actually happening. Consider those two examples (among other):

  • block/put: kubo insert the block into the blockService, then call PinWithMode, which only record the pin. No check is made, Pinner blindly accept the pin and guarantee nothing.
  • pin/add: kubo fetch the root block, then call Pin, which will insert that root node, fetch the rest of the DAG then record the pin. Here, the Pinner does guarantee that the data is there.

Also unlike ipfs/go-bitswap#571 where there was a bug where blocks were local puts were duplicated (once in blockservice and once in bitswap), I am not aware of any bug here, it's just very side effect heavy code that hard to use without the monolith architecture Kubo was built on.

If you look at my pin/add example, I'm pretty sure there is a duplicated Put there. DAGService get the block from the network (side effect: put), then pass it to Pinner which will write it again.

Maybe there is something better elsewhere but this seems like a complete refactor of the pinner which I don't want to take on right now.

I'm only suggesting to eliminate Pin in favor of PinWithMode (although Pin is a better name). This function is only used 4 times in kubo (not counting tests), so that should be quite manageable.
We get back to a uniform API, eliminates double write, remove the odd side effect that makes it hard to compose and reuse ...

As for bitswap, I'd like to ask you to do the exercise in reverse: why would Pinner need to write blocks? Why does it needs to fetch a DAG? I don't see better reason than "it was written that way in ancient ages because it was convenient".

@MichaelMure
Copy link
Contributor Author

Additionally, Add() is currently racy with the GC, as it doesn't create a PinLock. This is because the pinner doesn't have access to the blockstore, while the calling side (kubo) does.

@aschmahmann aschmahmann changed the title Pinner should not be tasked to fetch blocks (pinning/pinner): Pinner should not be tasked to fetch blocks Jun 26, 2023
@aschmahmann aschmahmann transferred this issue from ipfs/go-ipfs-pinner Jun 26, 2023
@Jorropo Jorropo changed the title (pinning/pinner): Pinner should not be tasked to fetch blocks pinning/pinner: Pinner should not be tasked to fetch blocks Jun 26, 2023
@Jorropo Jorropo removed their assignment Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/architecture Core architecture of project need/analysis Needs further analysis before proceeding
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants