Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Selector resource budgets #27

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

warpfork
Copy link

tl;dr Selectors currently don't have a resource budgeting system, and this means accepting user-defined selectors is a DoS vector. Work is needed to fix this in order to make Selectors usable in more scenarios. These scenarios include things wanted in Filecoin implementations, to my understanding.

@warpfork warpfork force-pushed the selector-resource-budgets branch 2 times, most recently from 3a11d33 to 0b8dbad Compare February 17, 2021 11:37
@lidel
Copy link
Member

lidel commented Feb 17, 2021

I suspect this would be valuable outside Filecoin context as well.
AFAIK there is a lot of hidden effort duplication across pinning services like Pinata: all of them need to manually code some safeguards to protect themselves against DoS via maliciously crafted DAGs.

Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @warpfork. Interesting! Per comments, before engaging deeper on this one, I think it would be helpful to have some more specific usecases that customers are asking for. Let me know if help is needed to gather any of this.

proposals/SelectorResourceBudgets.md Outdated Show resolved Hide resolved
(They're sorta like regexps for DAGs, if that's a useful comparison for you.)
We want to expose these

The problem is: if a service wants to accept Selectors which are user-specified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have more use-case examples? Basically:

  1. I assume anyone with go-ipfs binary installed can do anything they want with selectors and they can hose their machine and we can't fully stop them (but we do have to ensure the network is safe).
  2. What kind of needs does FileCoin have?
  3. What are some example service uses?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For go-ipfs: right, I could care less if someone uses a local API to hose themselves. But people seem to want to expose these things publicly. For example, #1 seems to imply we're going to have APIs oriented around Selector queries, and does not say much about not letting these be exposed remotely. This is representative of most conversations I've ever overheard about Selectors and what people want to do with them. (So, if nothing else, this proposal needed to be made to track the situation!)

For filecoin: I taaaag.... @magik6k ? (I have repeatedly heard this is wanted, that's the depth limit of my knowing.)

In general: it seems like it's almost a law of human nature that people want to ask arbitrarily complex questions without concern for the costs on the answerer 😆 / 😢 The Selectors system seems to be no exception, heh.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big use-case I can't believe I forgot in the earlier comment: graphsync.

We seem to talk about the intention to use graphsync between untrusted peers who might be exchanging data without a fee mechanism. If that's true, then it will be important for such peers to have at least some cost estimation mechanism and cutoff options.

#### Impact
_How directly important is the outcome to web3 dev stack product-market fit?_

However important Selectors are to web3 dev stack PMF, this is that times about 0.95.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any insight as to how important Selectors are to web3 dev stack PMF? Should we solicit anecdotes/data from the PM team?

Comment on lines +148 to +151
Two: It's possible to work around this in some cases by building APIs around selectors,
but then only accept a known, pre-specified set of selectors.
(If I understand correctly, this is how several pieces of Filecoin currently around around this issue.)
This is not a general workaround, though, and ruins most of the point of Selectors -- they're *supposed* to be user-specifiable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the specific, usecase but I assume a gimped selector syntax could still be useful potentially for some users depending on their needs.

Copy link
Author

@warpfork warpfork Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we would need to invent such a gimped syntax.

That's probably harder than planning and implementing budgets within the current syntax.

My general experience with this topic is: you cannot, no matter how big of a gavel you wave and how energetically you wave it, convince people to stop asking for features that would make a system accidentally turing-complete (and thus an unbounded DoS vector). (This is doubly true when it comes to tree or graph processing, which, in a sudden flash hindsight that only occurs to me fully now, probably ought not be a surprise.) Therefore: monotonically decreasing budgets, often aka "gas", is the only real way to unambiguously communicate the problem, and thus the only real practical way to solve it.

Copy link
Author

@warpfork warpfork Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other route is "invent a gimped syntax and a compiler that verifies its non-TC-ness" -- and that's possible; that's what eBPF is, if I understand correctly -- but it's a huge amount of engineering work...

... and I sorta wouldn't bet very favorably on if that approach would work for tree/DAG processing scenarios anyway. I'd rather bet money that it would end up with people wanting to apply the eBPF-like thing repeatedly on every block they visit.

Which would get us back to approximately the same problem with Selectors right now: since those things would "restart" their budget on every block visited, we'd need some... bigger, holistic, monotonically decreasing budget.

Comment on lines +140 to +141
#### Alternatives
_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume, but is there any concept of pagination/partial results that can be applied for selectors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been discussed, but no such thing is implemented nor shipped at present.

Resuming that discussion would probably be a part of the work that would go on while engaging on this project.

Copy link
Author

@warpfork warpfork Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, it's my understanding that at present, systems using Selectors get in the habit of launching small queries (depth limited, or, constructed to favor left-leaning trees for example) to get started with exploring the data, and use more queries subsequently.

This "works" but obviously leaves some load to the brain of the human crafting the Selector, which isn't really the most desired outcome. (It's maybe fine if you're a human, splunking interactively -- but it's not so great as a basis for APIs if we want programs to be built which generate Selectors automatically in response to some higher-level user actions.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some prior discussion involved ideas like "what if we could ask the selector to return every (N % 3 == 1) blocks?" and similar ideas. The aim there was to end up with something that you could imagine a system generating automatically in order to fan out queries for data to multiple peers and start getting different fractions of the data back from them in parallel.

This only got to the discussion phase. There may be neat ideas here, but they trend towards getting complicated, so we pushed them out of the first round of Selector work.

@BigLep BigLep added the Steward Priority Stewards priority project due to enabling us to move faster and/or safer. label May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease:low Ease rating is 5 or below. Steward Priority Stewards priority project due to enabling us to move faster and/or safer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants