Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Challenge sampling #98

Closed
bvohaska opened this issue Mar 12, 2019 · 2 comments
Closed

Challenge sampling #98

bvohaska opened this issue Mar 12, 2019 · 2 comments
Assignees
Labels
launch-critical Required for launch

Comments

@bvohaska
Copy link

@whyrusleeping commented on Tue Jun 19 2018

Where do we sample the randomness to seed our challenge from that we use to generate PoSTs?

My current mental model is that we use the hash of the smallest ticket in the tipset we are buiding a block on top of.

However, @jbenet says to sample it from 'far back' in the blockchain. The problem with sourcing randomness from farther back than the previous block is that miners can then start completing their proofs way ahead of time, and figure out if they are the winner of a given future block height. If they find that they are the winner, they can then immediately submit their block, effectively defeating the whole point of having a verifiable delay function in the first place. We lose out on its ability to space out blocks. The only way i can think of to get around this would be: For a lookback parameter of N blocks, make the PoST take N+1 blocktimes to complete. This kinda solves my issue, but introduces another issue that miners will essentially need greater than N cpu cores to be concurrently checking all of the proofs they might win. You could get around that partially by having the proofs reveal whether or not the ticket is a winner relatively shortly, but then still require they complete the VDF for the rest of the time to ensure even block spacing.

Anyways, that seems rather infeasible to me. So unless someone corrects me, i'm going to keep my current mental model of sampling from the previous tipset.


@whyrusleeping commented on Tue Jun 19 2018

Just a thought: You could do two proofs, one based way back in time to check if you've won, and then one based on the previous block that makes the timing work out.


@whyrusleeping commented on Tue Jun 19 2018

Problem with proof being sampled from your chosen tipset, if you chose the wrong tipset (it ends up not getting included in the chain) your proof becomes invalid, and it cannot be included as part of the period proof submission.


@whyrusleeping commented on Tue Jun 19 2018

There may exist something written down on juans computer that contains a magical solution to this. If that shows up, i would be happy


@ZenGround0 commented on Tue Jun 19 2018

The two proofs idea mentioned above seems to line up with the "2 processes mining" idea I've heard @nicola bring up when discussing integrating PoSts into FileCoin and using this integration scheme appears to solve the problems you mention here.

One proof is used to prove a miner deserves to keep their collateral, another is submitted to the EC lotto to win block rewards. The collateral proof can use a lookback parameter back in the past because, grind all you want, the value of the resulting PoSt won't give away any info about a miner's victory in a future epoch. The lookback parameter prevents the problem of collateral proofs being invalid if a miner mines on the wrong chain for a few tipsets and therefore solves the mentioned period proof submission issues.

If the mining reward proof's challenge is extracted from the latest tipset this solves the grinding issue. Also it doesn't matter if this proof gets thrown out when a miner chooses the wrong tipset. Its not being used in the period proof submission to prove collateral.


@nicola commented on Wed Jun 20 2018

for completeness, we have rewritten the different strategies here: https://gist.github.com/whyrusleeping/4c05fd902f7123bdd1c729e3fffed797#gistcomment-2623854


@jbenet commented on Wed Aug 29 2018

Quick aside, you may recall these diagrams, @whyrusleeping @nicola and i discussed them at the onsen.

filecoin-consensus-constructions 001
filecoin-consensus-constructions 002
filecoin-consensus-constructions 003
filecoin-consensus-constructions 004
filecoin-consensus-constructions 005
filecoin-consensus-constructions 006
filecoin-consensus-constructions 007
filecoin-consensus-constructions 008
filecoin-consensus-constructions 009
filecoin-consensus-constructions 010
filecoin-consensus-constructions 011
filecoin-consensus-constructions 012
filecoin-consensus-constructions 013

Source: https://ipfs.io/ipfs/QmSpHWUUQXcS9yrkvAZnKbgiAukAhFwrDCjMdAfuWj6wKD


@jbenet commented on Wed Aug 29 2018

(Came here summoned by my PQ)

High level, there are a bunch of options and they all suck. We have to pick the one(s) that suck the least.

Your discussion is pretty spot on, but doesn't cover some other cases. The gist linked from Nicola's doesn't cover all either, i think. it's hard for me to tell as I don't understand some of the constructions listed there.


On the Randomness. First need to mention that the randomness is not just "the hash of the smallest ticket", because depending on how the ticket is produced, it may enable grinding. It has to be isolated from any kind of entropy introduced by user-manipulatable variables. If this sounds confusing, look into grinding in SpaceMint (and SnowWhite and Algorand IIRC). Instead, let's say we have a function that excracts randomness for a given height, and consider that abstractly. The concrete choice must be resistant to grinding (it may not be fully). It may be just the hash of ticket T -- if we can show it doesn't introduce grinding for the full construction. Or it might be a combination of values extracted from all over the chain, including the hash of the ticket T_n (this is also known as reducing bias in a randomness beacon).

So, let r(t) be a function that extracts randomness for time (block, height) t.


The problems:

  • (1) Sampling r(t) from block t-1 (right before) increases the grinding factor (makes consensus less stable).
  • (2) Sampling r(t) from block t-k (k blocks back, where k > stability parameter) decreases grinding factor (makes consensus more stable), BUT if done naively, allows r(t) to be known at time t-k.
  • (3) Using a VDF allows us to sample r(t) from t-k and have it only be known at time t (run the VDF since then). But doing it naively, may mean running k VDFs simultaneously. Need to only run 1 (or 2) at most.
  • (4) Constructions with a "long-running VDF" (VDF just being extended on its own, with no other timing input) MUST be very precise, as skew will accumulate very quickly. Synchronizing with the network (exchaning blocks, having to mine on each others' blocks) allows even very imprecise VDFs with significant error to be OK network clocks (eg Bitcoin's PoW).
  • (5) Using a PoSt as a "long-running VDF" without including inputs from the rest of the network (from the chain) is not a good idea -- PoSt is not as precise as we would like it to be. Using a PoSt as a VDF in between each block height is fine.
  • (6) Mixing in results from a PoSt (or other VDF) from ticket t-1 into the source PoSt for ticket t may add entropy and thus enable grinding. Careful.
  • (7) Whatever happens, miners that shut down and come back up (or start mining) at some random time n, should not have to run a VDF for all the blocks they were not around for -- they need to be able to recover quickly. Ideally immediately, though maybe running a VDF for a small number of heights is fine (eg k blocks).
  • (8) Long-running VDFs need to be compatible with (or somehow allow) interruptions (miners crashing, shutting down, or joining for the first time) without requiring all the missing entries.
    • This throws a wrench into all long-running VDF ideas, and forces us to propagate the value of the VDF through the network somehow (in the block? every so often in the block? (verifiable) -- or miners can tell each other the current VDF if it's verifiable in another way, but introduces incentive to cheat (i wont give you the VDF to prevent you from mining)).

Intuitions:
The constraints we need to satisfy include:

  • Reducing variability of possible tickets at time T is a good thing (ie mining on 2 different tipsets should ideally not yield 2 different tickets)
  • To keep everyone around, tickets for time T must be "scratched" (determined, checked, discovered, used) at time T and not before. (otherwise can check way ahead of time if you win -- sleepy consensus)
  • Constructions that force checking the ticket after doing a PoSt mean miners do the work before the ticket says they've won.
  • Constructions that allow checking the ticket before doing a PoSt mean miners ONLY do the work if they won. (this is a proof checking sampling rate vs mining costs tradeoff).
  • Making PoSts happen for all miners at every block height increases the "amount of proof checking" which is a good thing.
  • In some PoRep/PoSt constructions, making PoSts happen for all miners at every block height may be overkill (much more sampling than is needed), and therefore can relax things (check ticket before mining)
  • Long-running VDFs must be "interruptable" (kind of defeats the purpose/point), or need to be propagated through the network too, so others can pick up.
  • Long-running VDFs do not incorporate any information (are not tainted by) in blocks t-k ... t. But may incorporate info at t-k to force the VDF to be specific to this chain.

Constructions:

  • (1) (Naive/simplest) Sample r(t) from block t-1, run a PoSt (as a VDF) for 1 epoch (block time), output is the ticket.
    • Problem: may allow way too much grinding and be unstable for consensus. But may be fine with SSLE (single secret leader election).
  • (2) Sample r(t) from block t-k, ticket is only based on r(t).
    • Problem: miners know if they will win at t by t-k.
  • (3) Sample r(t) from block t-k, ticket is only based on r(t), use a VDF of length k to delay the ticket until time t.
    • Problem: if the VDF is a PoSt, it will skew really fast.
    • Problem: naive construction would yield k VDFs. only use one VDF.
    • Problem: if you use one VDF but introduce inputs from block t-1 at time t then it defeats the purpose (grinding).
  • (4) Run two PoSt, one as a long-running VDF, one between each block. Use the long-running PoSt for the ticket randomness only, but require the other PoSt for a valid ticket/block (does not contribute to randomness at time t).
    • Advantage: drastically reduces grinding
    • Problem: PoSt is bad for a long-running VDF. (skew)
    • Problem: Need to run 2 PoSts all the time.
  • (5) Run a cheap and precise VDF (not a PoSt) constantly, as a randomness beacon. Use that for the ticket randomness, require a PoSt separately for a valid ticket (does not contribute to randomness at time t). put the VDF value into the block, or propagate it along-side (must be verifiable in O(1))
    • Advantage: drastically reduces grinding
    • Problem: may skew from the network.
    • Problem: clock is not bound to the specific chain we're using. (same clock works for many different heads/chains).
    • Advantage: may be the same beacon for all blockchains (across multiple different networks).
    • Mitigation: may do periodic re-synchronizing w/ input from the network/chain (every j blocks (j >= k)).

I think options (4) and (5) are the most robust. (5) is the cheapest / most robust. (1) is the simplest and may work fine, especially with SSLE. It just introduces complexity/pain in EC without SSLE. We should not use option (2). Naive option (3) does not work / or causes k different VDFs. good option (3) may just be option (5).

The diagrams above show some of these construction ideas (including PoSt as a long-running VDF) but it's not good.


@nicola commented on Fri Sep 21 2018

(6) there is a random beacon (no post/vdf in between blocks)
(7) there is a vdf in between blocks (lasts 15 secs)

  • Issue: We cannot really run a post in between blocks:

    • Because: PoSt today takes longer than a blocktime
    • Note: if a miner wants to try every tipset its gonna be cpu expensive
  • Issue: If we use Post/Vdf to slow down block time, then blocktime = vdftime+block propagation. What if a small group of miners have a fast connection with each other and can minimize blockpropagation, would they be able have a faster chain


@bvohaska commented on Tue Sep 25 2018

Do we have any traction on this?

I think it's prudent to write down specific assumptions and constraints we are making and applying to this problem. I believe that the need for a random seed is strongly dependent on our choice of a PoST and associated rolling proof scheme. I'd suggest using a scheme that does not require a random seed. I'll give an example a little later in this novella.

Fundamental goal:

Let's explore our real goal for a moment. Namely, we want a miner to fairly prove that they have stored data and are making that data available to a client. If the proof is fair, then we can use the proof to assign FIL to valid storage miners and use the proof to make statements about a miners power. We'll call a proof fair if (1) a proof can not be generated using data that is not the clients genuine data (GD) and (2) any proof requires that the original data be present when the proof is generated.

What is affected by our PoST scheme

From the above we can infer that our PoST scheme affects power and FIL assignments. These in turn strongly influences leader selection. In short, FIL assignment, leader election, and assignment of power all depend on our PoST scheme.

PoST => FIL rewards & Power distribution & Leader election

Or in sky-is-falling terms, "if PoST is messed up FIL is broken". No surprise here.

Constraints on our PoST scheme

  • There does not exist a means by which a proof can be generated without a verifier first requesting a proof

  • A proof can not be pre-generated (1) before it is needed/requested (2) a proof is not simulatable within the time between proof submissions - generating trial proofs will not yield any economic incentive

So what are we assuming?

  • In the above posts, we have chosen a construction for proof of space-time that requires a random seed r in order for that scheme to be fair

  • If 📧 does not receive r, the resulting proof will have some issue

  • Issues resulting from not receiving r or receiving a bad random seed br would imply that the PoST was broken and FIL rewards, power, or leader election would be negatively affected.

  • We don't want to allow any entity to influence r because if r we're influencable then a proof could be faked or generated in an unfair manner

  • We assume the strong knowledge assumption fails (i.e. generation and SLESTACK attacks are possible)

  • ???

So why do we need a random seed?

Let's first look at what r is giving us in our current PoST scheme. PoRep paper. In the PoRep paper, r is a component that is sent by a verifier that tells the prover to search through some encoded data via pattern defined by r and return a proof P(r, GD). This is where the argument becomes implementation dependent and touches a fundamental question: "Does there exist a PoST f such that the resulting proof relies only on GD and is practical?". From the paper, the answer is no but maybe we only need r to be unpredictable enough. The following are proposal/examples where we might be able to use a good enough r.

Verifier miners

Suppose there is a new type of miner called a verifier miner. This miner would take an r from the latest block and compute:

M := Sign[r, r_miner]

Where r_miner is a per-message random seed. This miner would seek storage miners and send them M. The storage miner would then respond with a proof P(D,M). The verifier miner then sends a message C indicating that the storage miner has completed a proof. Where,

C := Sign(P, M)

We assume that the verifier miner is not able to choose any specific storage miners.

This ensures that no storage miner will be able to present unfair proofs. Unfortunately, this requires a new type of miner in FIL. This method also has the benefit of not depending on computation power.

VDF + r from latest block

I believe this is already a proposal from @nicola.

A storage miner retrieves r from latest block. This miner then computes PoRep and generates a proof P(D,r). The miner then computes a a special verifiable delay function,

vP := VDF(D,P)

such that vP requires ~0.5*T. Where T is the time between proof submission/validations requirements.

This ensures that a miner can't simulate too many proofs during a period T. Unfortunately, a miner might be able to farm out computation multiple VDFs an therefore simulate proofs; this would be very expensive.

r from latest block + disable c-m choice

Let r be taken from the latest block. But add the restriction that if a client chooses a storage miner, reduce the storage incentive and power gains by a factor f.

This ensures that we recover the strong knowledge assumption (generation and SLESTACK attacks are not possible) and as a result there is no way for data to be recovered if it is not actually stored. Unfortunately, this does not mitigate helper attacks (miner stores data in a 3rd party) because the proof has been generated and the data is not needed until a request for data is made. This can be partially solved with a continuous VDF operation over the data as proposed by @whyrusleeping and @Stebalien .

Wrapping it up: What are we really after here?

So the real question is: What will be our PoST scheme? In that scheme, do we actually need a secure random seed? Will a crappy seed work? Ultimately, we are only using the seed to ensure that no small collection of miners can unfairly generate proofs of Space-time under our security model--which I think isn't all that well defined yet (maybe it's good enough to say secure under A := {generation, helper, stacking} attacks).

Related Issues

https://github.com/filecoin-project/aq/issues/112
https://github.com/filecoin-project/aq/issues/93
https://github.com/filecoin-project/aq/issues/90


@whyrusleeping commented on Tue Sep 25 2018

@bvohaska one quick note, we both need good randomness for the PoSt scheme, and for leader election. It's convenient if the randomness is the same for both, but not necessary, and there are different considerations required for each (for example, if the randomness for a PoSt ends up being sourced from a fork that gets orphaned, then that PoSt is invalid). I think your comment illuminates the fact that this discussion has shifted more towards the 'how do we sample randomness for leader election' problem, and less of 'how do we source randomness for proofs'


@Stebalien commented on Tue Sep 25 2018

@bvohaska

The core issue here is that there is no latest block.

  1. For election: A potential leader can try many times to get elected by using each block.
  2. For proofs: We want to delay the generation of a proof until it's too late to reseal. However, we given the above issue, we also can't use randomness from the latest block because it'll make the proof invalid against other blocks.

@bvohaska commented on Tue Sep 25 2018

@whyrusleeping: good point. I'm still working on writing up the SSLE solutions but I can say that currently, SSLE will get it's randomness from an MPC calculation. @lucaniz and I were briefly theorizing the proving an MPC requirement in our SSLE scheme.

@Stebalien: Is this because of forking and such? If so, it looks like we might have a cycle with SSLE and PoST that we'll need to look at. When we say SEAL do we mean PoRepSetup()? Trying to unify language. Are all proofs at epoch X posted to a block at epoch X+1? If not, this might lead to an economic instability.


@Stebalien commented on Tue Sep 25 2018

Is this because of forking and such?

Yes but SSLE won't solve forks. It should solve them enough (probably) to prevent grinding (maybe) but even a single fork would cause problems with PoST. If I compute a PoST against against a "null" block (because I can't find any valid blocks for that time period), I'll have to recompute it if the network ends up accepting a valid block for that period at some later point in time. We could just force the miner to compute multiple in parallel but that would kind of suck.

When we say SEAL do we mean PoRepSetup()?

As far as I know, yes.

Are all proofs at epoch X posted to a block at epoch X+1?

No, the prover will be given multiple epochs. Otherwise, a miner could force the prover to pay Collateral - Epsilon in bribes to get their proofs included in the block. Having multiple blocks turns this into a game theory/economics issue (if I don't accept your proof, someone else may and I'll miss out on the transaction fee).

If so, it looks like we might have a cycle with SSLE and PoST that we'll need to look at.

(see my previous comment on this).

SSLE helps but doen't entirely solve the issue.


@porcuquine commented on Wed Sep 26 2018

@bvohaska Seal isn't PoRep.Setup. It's roughly PoRep.Replicate + PoRep.Prove.


@nicola commented on Fri Nov 30 2018

Conversations moved forward here: https://github.com/filecoin-project/aq/issues/98 #62

@nicola
Copy link
Contributor

nicola commented Aug 8, 2019

@sternhenri can we close this or is this still open?

@sternhenri
Copy link
Contributor

Good insofar as EC is concerned imo

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
launch-critical Required for launch
Projects
None yet
Development

No branches or pull requests

4 participants