Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC#2: Add provider to ipfs and provide when adding/fetching (WIP) #5870

Closed
wants to merge 15 commits into from

Conversation

michaelavila
Copy link
Contributor

@michaelavila michaelavila commented Dec 21, 2018

This is a follow up to #5840

My goal is to get some feedback on my progress towards reworking providing strategies and to offer a possible solution to #5822.

Overview

What is in this PR:

Provider

  • Use a strategy to decide which CIDs below a CID to provide
  • Queue CIDs to provide on disk, using the datastore (the "provide queue")
    • only if we haven't already provided this CID
  • Have workers work through the provide queue announcing each CID
  • When a CID is successfully announced, track it so that we know not to provide it again
  • Call IpfsNode.Provider.Provide(cid) all over the place
    • it's possible I've missed some places, I should triple check

Reprovider

  • Queue the list of all currently provided CIDs
  • Have workers work through the reprovide queue

What is NOT in this PR:

  • A robust provide strategy

TODO

  • reprovider based on tracker data and provider module
  • rebase
  • move ipfs bitswap reprovide to ipfs provider reprovide
  • move to core api
  • timestamp provider entries?
  • caller configured provider strategies
  • migrating data when nodes get upgraded to this new system
  • tests

Known Things

  • duplicates can be added to the provide/reprovide queues but no duplicate providing work will be done in terms of content routing

@michaelavila michaelavila added the status/WIP This a Work In Progress label Dec 21, 2018
@michaelavila michaelavila requested a review from Kubuxu as a code owner December 21, 2018 21:08
@ghost ghost assigned michaelavila Dec 21, 2018
@ghost ghost added the status/in-progress In progress label Dec 21, 2018
@michaelavila michaelavila force-pushed the experiment/provide+reprovide branch from 7eb3843 to 371157e Compare December 21, 2018 21:14
@michaelavila michaelavila changed the title RFC#2: Add provider to ipfs and provide when adding/fetching RFC#2: Add provider to ipfs and provide when adding/fetching (WIP) Dec 21, 2018
@michaelavila michaelavila force-pushed the experiment/provide+reprovide branch 8 times, most recently from cb7db36 to d86a2fb Compare January 8, 2019 22:12
pin/gc/gc.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

Preliminary comment: inline documentation would be really nice. While the code is nice and clean, I'd rather not guess what each subsystem is doing based on naming.

@michaelavila
Copy link
Contributor Author

@Stebalien understood. Do you need me to add anything immediately to clarify for your review? If not, then I'll add documentation of each new provider module when I get the reprovider done. Which is what I'm working on now.

@Stebalien
Copy link
Member

A few one-liners would be nice. It looks obvious but I don't want to incorrectly guess intention.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.


// Provider the given cid using specified strategy.
func (p *Provider) Provide(root cid.Cid) {
cids := p.strategy(p.ctx, root)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider moving the provider strategy outside of the provider subsystem. That is, have the provider subsystem only handle tracking/enqueuing.

A "strategy" would then be a function like:

type ProviderStrategy = func(*Provider, cid.Cid) error
func ProvideRoots(p *Provider, root cid.Cid) error {
    return p.Provide(root)
}
// (a trivial example)

Rational:

  1. We're going to need to choose strategies at runtime. We could also do that by passing them into Provide but, IMO, applying the strategy outside gives us more flexibility.
  2. We're probably going to want to record chosen strategies elsewhere (e.g., in pin metadata) so we're going to need to manage this information elsewhere anyways.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I'm going to address this feedback last so we can discuss it in more detail after I get the other feedback incorporated. Briefly, though, my thoughts are:

Would we still need to allow a "global" providing strategy to be configured? How does that work with the ability to call provider strategies explicitly? Would we need to use an abstraction at all of the provide call sites that chooses the actual strategy to use based on the config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we need to use an abstraction at all of the provide call sites that chooses the actual strategy to use based on the config?

Yes. The issue is that we'll likely want different default strategies for different commands (files, other data, pinning, etc.). For example, a "provide every directory/file root" strategy won't make sense as a default for the ipfs dag or ipfs block commands because those work directly with IPLD blocks (which may not be files).

However, it may make sense to introduce another service for this. Maybe? That is, something with a function like GetStrategyFor(stuff) (Strategy, error) where stuff includes things like:

  • Whether or not we're pinning.
  • The type of data we're handling (raw blocks, ipfs files, etc.).
  • Maybe even the root CID?
  • Maybe even the current command?

Really, I'm not sure what the right design is here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I'm processing this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien It sounds like the plan is to move the provide calls to the call-sites and to choose an appropriate providing strategy for each location. However, there is currently the ability to configure a providing strategy, which we've expressed the desire to preserve. I'm trying to figure out exactly what those configurations would do.

I think a reasonable approach is to determine all the places where a "provide all" currently takes place (e.g. ipfs add) and to allow those locations to be configured by the user's provider strategies. This makes the system behave a little different from the way it behaves right now, which I think is unavoidable, but achieves the larger goal of allowing a limited strategy to be used in place where there's the potential to provide too many blocks.

Any thoughts?

Copy link
Contributor Author

@michaelavila michaelavila Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, in the locations where a "provide all" does not make sense, we just always use whatever providing strategy we decide makes the most sense in that location. Which I think just means calling Provider.Provide(cid) directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a reasonable approach is to determine all the places where a "provide all" currently takes place (e.g. ipfs add) and to allow those locations to be configured by the user's provider strategies.

Exactly.

This makes the system behave a little different from the way it behaves right now, which I think is unavoidable

Correct. Really, we can probably deprecate the current reprovider strategy system and write a config migration.


For example, we could (eventually) end up supporting a provider strategy spec like the following:

{
  // Default strategy.
  "default": {
    // all objects.
    "all": {
      "strategy": "root",
      // put it in the queue, announce once, but don't record or reprovide
      // We probably won't support this out of the box...
      "when": "once",
    }
  }
  // For `ipfs add --pin=false`. That is, data _we've_ chosen to add (but not pin).
  "add": {
    "all": {
      "strategy": "root",
    },
    "unixfs": {
      "strategy": "files",
      "when": "once",
    }
  },
  // Either `ipfs pin /ipfs/...` or `ipfs add ...`
  "pin": {
    // fallback.
    // In this case, it only applies ot `ipfs pin /ipld/Qm...`
    "default": {
      "strategy": "all",
    },
    // Applies to things like `ipfs pin /ipfs/Qm...`
    "unixfs": {
      "strategy": "files",
    },
  },
}

(however, the user should be able to override the strategy when adding/pinning data.

provider/provider.go Outdated Show resolved Hide resolved
provider/provider.go Outdated Show resolved Hide resolved
provider/provider.go Show resolved Hide resolved
provider/provider.go Outdated Show resolved Hide resolved
provider/queue.go Outdated Show resolved Hide resolved
provider/queue.go Show resolved Hide resolved
@michaelavila michaelavila force-pushed the experiment/provide+reprovide branch 3 times, most recently from ad614ed to f59148e Compare January 18, 2019 00:49
Copy link
Contributor

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing super blocking.

I would convert any blocking channel writes when a cancellable context is present to select statements (see strategy comment)

Everything else a mix of nitpicks and questions -- I am not entirely sure I'm qualified to provide more architectural feedback :)

LGTM in my book but not formally choosing Approve cause I assume there will be more revisions prior to merge, and I'll re-review once it's on the verge of merge.

core/core.go Show resolved Hide resolved
provider/provider.go Show resolved Hide resolved
provider/queue.go Outdated Show resolved Hide resolved
provider/strategy.go Outdated Show resolved Hide resolved
@michaelavila michaelavila force-pushed the experiment/provide+reprovide branch 3 times, most recently from 126924a to d5f8cf3 Compare January 22, 2019 23:45
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
Really, there's only one error that makes sense to stop and report,
which is when obtaining the tracking information. I've opted to just
logging that error at the warning level and returning.

License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
License: MIT
Signed-off-by: Michael Avila <davidmichaelavila@gmail.com>
@michaelavila michaelavila force-pushed the experiment/provide+reprovide branch from 1a07bc3 to 94a5d26 Compare March 5, 2019 19:46
@michaelavila
Copy link
Contributor Author

I'm just noticing this is happening, putting here so I don't forget:

λ cmd/ipfs/ipfs init
initializing IPFS node at /Users/me/.ipfs
generating 2048-bit RSA keypair...done
peer identity: QmWotgPWydYsdr9m5UBj7Zyy9dLhX2ecQkXLqRB8x3ZCwb
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x16b37e6]

goroutine 1 [running]:
github.com/ipfs/go-ipfs/provider.(*Provider).Provide(0x0, 0xc0000e04e0, 0x22, 0x20b4460, 0xc00017cee0)
	/Users/me/go/src/github.com/ipfs/go-ipfs/provider/provider.go:62 +0x26
github.com/ipfs/go-ipfs/core/coreapi.(*ObjectAPI).New(0xc000182280, 0x20a0d20, 0xc0000c68a0, 0xc00016e0c0, 0x1, 0x1, 0x0, 0x0, 0xc000184500, 0xc0000d0be0)
	/Users/me/go/src/github.com/ipfs/go-ipfs/core/coreapi/object.go:58 +0x14d
github.com/ipfs/go-ipfs/assets.addAssetList(0xc00000c1e0, 0x2a872c0, 0x7, 0x7, 0x0, 0x0, 0xc00004a000, 0x17)
	/Users/me/go/src/github.com/ipfs/go-ipfs/assets/assets.go:55 +0x178
github.com/ipfs/go-ipfs/assets.SeedInitDocs(...)
	/Users/me/go/src/github.com/ipfs/go-ipfs/assets/assets.go:36
main.addDefaultAssets(0x20808e0, 0xc000010010, 0xc0000d80c0, 0xf, 0x0, 0x0)
	/Users/me/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/init.go:218 +0x1ab
main.doInit(0x20808e0, 0xc000010010, 0xc0000d80c0, 0xf, 0x2aa5d00, 0x800, 0x0, 0x0, 0x0, 0xc000184280, ...)
	/Users/me/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/init.go:167 +0x2d1
main.glob..func2(0xc0002d8c40, 0x20a1060, 0xc0000b8900, 0x207f500, 0xc0000b88a0, 0xc000123d28, 0x1af0afe)
	/Users/me/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/init.go:113 +0x1de
github.com/ipfs/go-ipfs-cmds.(*executor).Execute(0xc00016e078, 0xc0002d8c40, 0x20a1060, 0xc0000b8900, 0x207f500, 0xc0000b88a0, 0xc0000b8900, 0x0)
	/Users/me/go/pkg/mod/github.com/ipfs/go-ipfs-cmds@v0.0.1/executor.go:81 +0x15d
github.com/ipfs/go-ipfs-cmds/cli.Run(0x20a0c60, 0xc0000da580, 0x2a87bc0, 0xc0000f4000, 0x2, 0x2, 0xc0000e2000, 0xc000010010, 0xc0000e2008, 0x1f37090, ...)
	/Users/me/go/pkg/mod/github.com/ipfs/go-ipfs-cmds@v0.0.1/cli/run.go:138 +0xa69
main.mainRet(0x0)
	/Users/me/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:179 +0x421
main.main()
	/Users/me/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:85 +0x22

@michaelavila
Copy link
Contributor Author

Superseded by #6141

@ghost ghost removed the status/in-progress In progress label Mar 27, 2019
@michaelavila michaelavila deleted the experiment/provide+reprovide branch March 27, 2019 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/WIP This a Work In Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants