Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0079? | Implement Ouroboros Leios to increase Cardano throughput #379

Closed
wants to merge 4 commits into from

Conversation

dcoutts
Copy link
Contributor

@dcoutts dcoutts commented Nov 17, 2022

This CIP proposes the implementation of Ouroboros Leios as the solution to increase Cardano throughput in the long term.

See the rendered version and the included design report.

We include a copy of the Ouroboros Leios design PDF, but it is also available from the IOHK research library.

The design report source is now included in this PR, so feel free to use the github PR review feature to help structure comments and questions on the report. (Apologies that this was not included from the start. Lesson learned!)

include it in full in this README. Instead, as part of this CIP we include a
larger document that describes Ouroboros Leios in much more detail:

[*Ouroboros Leios: design goals and concepts*](leios-design.pdf)
Copy link
Contributor

@L-as L-as Nov 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the design of Leios.

Two thoughts:
I'm not sure the colour system is optimal. It seems to me that there might be a way to integrate colours deeper into the ledger to further prevent conflicting transactions in concurrent input blocks. I don't have any concrete suggestions unfortunately.

It seems to me that the concept of endorsement certificates can be entirely replaced by (probablistic) proofs à la ZKPs. This is however more complex because of the still unstable nature of the technology, but it seems useful to note that in a future extension/development, it could be replaced by such a mechanism.
(The current system could already be classified as a probabilistic proof, but this would be more efficient for the same probability of correctness/soundness).

Otherwise LGTM (besides the missing parts of course about e.g. rewards)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.

The colour system does still need to be properly prototyped & validated, and if it doesn't work as well as hoped then we need to look for alternatives.

Yes, we've not detailed rewards, but the general idea is that the reward payouts are the same in total, the only difference is for SPOs and is about which blocks we count. Obviously in Praos we simply counted blocks, since there is only one kind of block. For Leios we will need to figure out how to count input blocks, endorsement blocks and ranking blocks, given that they all occur at different frequencies. For example, should the less frequent endorsement and ranking blocks be weighted more than input blocks? This is a detail that needs to be worked out.

@michael-liesenfelt
Copy link
Contributor

michael-liesenfelt commented Nov 21, 2022

Duncan,

The rewards for the RB, EB, and IB layers is going to play the critical role in deciding if the Cardano network centralizes further than MAV=21, or begins gradually decentralizing. You purposefully avoided allocation bias by using the same VRF mechanism for RB's, EB's, and IB's. You will have to add a section to this CIP discussing the reward scheme, and it may be wise to treat all blocks at all layers equally.

Because the MAV is 21, not >210 as intended, the blockchain itself is a no-confidence determination against reward schemes created by IOG-research-team / IOG-RSS-team / IOG-incentives-team. The outcome of CIP-50 and the RSS-revision-2 research, study, and publication will be necessary to guide exactly which reward function should be applied to the RB, EB, and IB's.

The CIP-50 RSS-revision-2 process may be a prerequisite dependency of CIP-Leios.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 21, 2022

@michael-liesenfelt thanks for the feedback.

Remember that we don't need to specify a whole detailed reward scheme. We can reuse whatever reward scheme is in place at the time Leios development is integrated into Cardano. The only difference here between Praos and Leios is how we count the number of blocks when it comes to seeing how well each SPO performed. That obviously has to change since there are now three kinds of blocks (plus reports). But then that just gets plugged into whatever reward formula is in use.

it may be wise to treat all blocks at all layers equally.

I have to say, that makes me a bit nervous. Not all blocks are equally important for the successful operation of the system. Missing a few IBs is less bad than missing a few EBs or RBs.

Other alternatives one might imagine is weighting EBs and RBs by their relative frequency compared to IBs. For example if there were 1 IB per second (on average) and 1 EB per 10s and 1 RB per 20s (on average), then under that idea the EBs would be weighted 10x more than IBs, and RBs, 20x more than IBs.

On the other hand, it is actually ok to miss some EBs or RBs and later ones can still pick up the IBs that have been created. In other words there is some limited tolerance for missing EBs or RBs without affecting throughput. That argues for a slightly lower weighting of EBs and RBs than a simple frequency argument might suggest.

Anyway, I think the point is that there's some subtly here, but it's not really the same as the general question of how best to arrange the rewards scheme on Cardano. So I don't think it does need to wait on the outcome of revising the general rewards scheme. (That said, Leios development is likely to take a lot longer than those anyway.)

@michael-liesenfelt
Copy link
Contributor

Remember that we don't need to specify a whole detailed reward scheme. We can reuse whatever reward scheme is in place at the time Leios development is integrated into Cardano.

A modification of the reward equation may be necessary for equal or differential weighting of RB's EB's IB's.
r( stake , pledge ) = R * ( C1*total_RBs + C2*total_EBs + C3*total_IPs ) * RSSv[1,2](RBs,EBs,IBs)

Currently with only RB's (C1 = 1/21600 , C2 = 0 , C3 = 0 ) :
r( stake , pledge ) = R * ( (1/21600)*total_RBs + 0.0*total_EBs + 0.0*total_IPs ) * RSSv1(RBs,0,0)
r( stake , pledge ) = R * ( (1/21600)*21600 ) * ( RSSv1 )
r( stake , pledge ) = R * ( RSSv1 )

On the other hand, it is actually ok to miss some EBs or RBs and later ones can still pick up the IBs that have been created.

If transaction volume uses less than the RB's capacity then empty/missing EB's and IB's would not effect throughput. Agreed.

If the transaction throughput is massive, maximizing RB's, EB's, and IB's together as Leios is designed for, then all of the classes of block are [equally?] important for carrying chain throughput. It may be wise to incentivize EB's and IB's as if every SPO needs to participate and be ready for full throughput all the time.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 22, 2022

If transaction volume uses less than the RB's capacity then empty/missing EB's and IB's would not effect throughput. Agreed.

The mechanism is a bit more subtle than that, and doesn't rely on that the transaction volume being less than the system capacity.

If EBs are missed, later EBs can simply include those older IBs. Yes there is a limit to the number of IBs in an EB, but that can be set to be a fair bit higher than the typical rate of IBs. So missing some (but not too many) EBs can be handled without loss of throughput.

If RBs are missed or delayed (due to the random schedule) then what happens is that EBs end up forming short chains, referring to previous complete ones. This then allows a later RB that only includes e.g. 3 EBs directly to indirectly include many more. So again, this allows catch up (without loosing throughput) so long as not too many RBs are missed in a row.

On the other hand, missing/empty IBs cannot be caught up. So if the system were at capacity then missing them does loose throughput. Of course it must be expected that some will be missed. There are also lots of them. So it's not a big problem.

@zygomeb
Copy link
Contributor

zygomeb commented Nov 24, 2022

Duncan,

I'd be really interested in hearing more about why we want to hold elections for Endorsement Report builders and how the statistical argument is made -- as opposed to creating a Mythril threshold signature. Is it the efficiency of it? Are there any drawbacks to using it? (and is there any possible attack if a cabal of spos share schedules in order to get enough reports to 'certify' an endorsement block)

Further, have there been any alternate solutions not making use of the Endorsement Reports?

@KtorZ
Copy link
Member

KtorZ commented Nov 25, 2022

Thanks a lot for sharing this @dcoutts. The paper is clear and gives an excellent high-level overview of Leios. I was initially expecting this to go more into detail, but I assume that this will be for later.

I've written down a few questions while going through it:

What about transaction finality, especially from a client application perspective?

Certainly, Input Blocks only give a weak guarantee of finality because a valid transaction that makes its way into a Ranking Block may still end up being discarded. Yet, Endorsement Certificates feel like they already provide a certain degree of guarantee. At least, they are used by block producers to make consensus decisions; are these certificates meant to be consumed by client applications as a somewhat reliable source?

Besides, it seems like, fundamentally, Leios doesn't change the definition of transaction finality for Cardano when it comes to ranking blocks. Yet, that definition was already not quite established for Praos. There was an equation circulating at some point which I had captured as an interactive plot taking into account the grinding power of adversaries and (assumed) adversarial stake in the system. It's probably hard to put numbers on this yet given the many variables at play, but, it would be nice to eventually get some proper order of magnitudes for client applications w.r.t. finality (risk of non-settlement vs depth on the chain)

The definition of 'now'?

The proposal currently needs to settle on how to approach time. While giving arguments in favour of two options, is the intention to get this answered before moving forward with the proposal or is it something that will likely be settled when starting the implementation work?

Relationship with Mithril?

In the analysis of the resource usage of each node, the document currently considers only activities related to the Cardano consensus. However, Mithril is right at the corner and seems like an activity that will undoubtedly require some bandwidth and CPU cycles. In addition, stake-based signatures have to be produced by block producers and, thus, will likely become an integral part of the nodes' day-to-day activity.

Besides, if one squints a little bit, we can see some close similarities between Endorsement Reports/Endorsement Certificates and Mithril's multi-sig. So I wonder if the plan is perhaps to have Mithril signers piggyback on the Endorsement Reports (or vice-versa, have Leios piggyback on Mithril's signers) and whether this was accounted for in the design 🤔 ?

Block producers starvation?

From what I understood, there's a certain threshold of endorsement for an Input Block to make it into a Ranking Block. This enables (Ranking) block producers to emit blocks containing data they might not have yet seen. Yet, it still assumes that Input Blocks are seen and endorsed by a sufficiently large portion of the stake. In a highly concurrent system, I can imagine this causing a starvation issue for the (Ranking) block producers; with a sufficient inflow of different transactions on all network entry points, we may saturate every mempools without reaching a sufficiently high threshold of endorsement to produce any block.
There's mention in the design of a "pessimistic" mode as a fallback in case a situation like that shows up. However, it's unclear how we intend to detect that at a local level (the node itself) since this is inherently a global problem. Moreover, how can nodes reach a consensus on establishing a pessimistic mode if they cannot utilize their primary consensus instrument?

@ilap
Copy link

ilap commented Nov 25, 2022

Remember that we don't need to specify a whole detailed reward scheme. We can reuse whatever reward scheme is in place at the time Leios development is integrated into Cardano.

Fully agree with @dcoutts

A modification of the reward equation may be necessary for equal or differential weighting of RB's EB's IB's. r( stake , pledge ) = R * ( C1*total_RBs + C2*total_EBs + C3*total_IPs ) * RSSv[1,2](RBs,EBs,IBs)

Currently with only RB's (C1 = 1/21600 , C2 = 0 , C3 = 0 ) : r( stake , pledge ) = R * ( (1/21600)*total_RBs + 0.0*total_EBs + 0.0*total_IPs ) * RSSv1(RBs,0,0) r( stake , pledge ) = R * ( (1/21600)*21600 ) * ( RSSv1 ) r( stake , pledge ) = R * ( RSSv1 )

Why would it need to be at all? As the actual rewards take the apparent performance (which is indepenent of the f(s, o) (e.g., p*f(s,o) = f(s, o, p) ), where the blocks creation comes into play, into account.

@ia-davidpichler
Copy link

Why have an algorithm to determine running in optimistic or pessimistic mode? Why shouldn't it just always allocate free space in ranking blocks to regular transactions so that if no or few EBs are available transactions can still make it from the mempool into the RB.

@L-as
Copy link
Contributor

L-as commented Nov 26, 2022

Duncan,

I'd be really interested in hearing more about why we want to hold elections for Endorsement Report builders and how the statistical argument is made -- as opposed to creating a Mythril threshold signature. [...]

Mithril is just a way to have M-of-N schemes weighted by stake. You likely also want such a signature scheme here for the endorsement reports.
Endorsement certificates and Mithril don't solve the same problem, the latter is a way of implementing the former, though in this case it doesn't seem to improve soundness much, because the VRF already gives us the weighting.
I'm not sure how IOG precisely intends to use a stake weighted multisignature scheme to implement essentially checkpoints, but naïvely, it seems like it'd be the same as a "Ranking certificate" essentially. @dcoutts could potentially talk to the people leading that and already include it in the proposed design if it is that simple.

As I noted further above, yes, there are other ways of solving this issue, see #379 (comment) .
To reiterate, you could use e.g. Fast Reed-Solomon Interactive Oracle Proofs of Proximity, the Inner Product Argument, KZG, or other solutions to prove the correctness of a relation such that the proof is easier to verify than verifying the relation itself (and/or needs less knowledge).
Fundamentally this doesn't change the properties of the system, because it's still a "probabilistic proof" as now, but because of the improved soundness you could get away with having just one party produce the proof and trusting that.

@TerminadaPool
Copy link

TerminadaPool commented Nov 26, 2022

I would like to point out an unfairness in the current Ouroboros design which is replicated in the new design resulting from how ranking blocks (RBs) are adopted.

Participation should be proportional to stake weighting

If your stake pool is housed somewhere that suffers higher propagation delays, then you are exposed to more single block height battles resulting when pools are allocated slots within a few seconds of your slot. You lose 50% of these blocks since the "battles" are settled by the random block VRF. These "single block height battles" occur because either you don't receive the previous block in time, or the next pool doesn't receive your block in time. If your pool is housed somewhere that suffers propagation delays of 1.5-3 seconds whereas those housed in Europe data centres operate with < 1 second delays then your lost rewards are relatively significant. So what is the solution? House your stake pool in a data centre in Europe. Centralise!

This means that a stake pool housed somewhere like Australia does not get to participate in proportion to its controlled stake since relatively more of its ranking blocks (RBs) will get dropped (orphaned). IE: Less of its RBs will make it on-chain than is proportional to its stake weighting. And not because it is run poorly, only because it is housed in Australia. This is an unfairness in the protocol.

Blocks other than ranking blocks (IBs, EBs) do not suffer from this problem since none are dropped so long as they arrive within acceptable network delay bounds. Therefore the proportion of both IBs and EBs that make it onto the chain will reflect the stake pool's controlled stake proportion.

Solution?

Maybe orphaned RBs can be included somehow and rewarded equally if they have been correctly produced? If this is not possible then I would campaign for only a tiny proportion of total rewards being allocated for RBs. The smaller the better, since higher rewards for RBs will produce a higher centralisation force.

@michaelpj
Copy link
Contributor

Apologies for the big comment. The disadvantage of the PDF is that we can't do inline comments on particular topics!

I was generally smug to see how much we rely on determinism and "validate, don't compute". This is why those properties are important, and I'm very happy we kept pushing to keep them!


Re colours: AIUI, the advantage of the colour system over an IB producer entirely randomizing their transaction selection choice is that it allows transaction submitters to indicate that they have multiple transactions that depend on each other and should be processed in the same IB. I think this would also be solved by batch transactions, since those would already be processed (or not) together.

I think this shows why batch transactions are nice: rather than coming up with a complicated system relying on the mempool selection logic whereby submitters can indirectly ensure that their transactions get processed together, we just add an explicit concept for a batch transaction, and then it's obvious to both the user and the system how to process it.

Solving this problem with batch transactions would also allow a lot more freedom to choose the randomization algorithm for constructing IBs. The constraint would become "do anything you like so long as you don't break up batches", rather than committing us to the concept of colours.

Of course we still have the question of whether batches are atomic or not; presumably non-atomic batches could have some of their transactions dropped when we resolve the final ledger state, whereas atomic batches would need to be dropped in their entirety. But I think these are solvable problems and also reflect user-level concepts that are useful in themselves.


Re double signing protection: would it be possible to handle this when rewards are distributed rather than when EBs are produced? It seems like we could handle this by slashing the rewards of the double-producing SPO, i.e. when you come to calculate rewards, any SPO that has multiple IBs in the same slot gets its rewards slashed.

I'm sure this would make reward calculation more complicated, but it seems valuable if it removes latency from the system!


Re network resource prioritization.

  • Why does "recent" need to be a protocol parameter, instead of just saying the node should strictly prioritize downloads in recency order?
  • Why isn't this sufficient to defend against the flooding attack, such that the maximum inclusion time is unnecessary?

Re time slot of a transaction: this seems very tricky indeed. A few scattered thoughts:

  • I think the model we had in our minds when thinking about the validity interval was that it gives you the consensus time. I think that corresponds to the RB slot rather than the IB slot, but that does seem much harder to implement.
  • If we use the IB time then this allows quite a lot of flex in when in real-time you can make a transaction with a given chain-time. For example, it seems that I could wait until right until the end of the IB inclusion time limit, create an IB based on a very old slot, and still get it into the chain. So I could effectively make a "past transaction" up to the IB inclusion time limit.
    • I think this might allow for some kinds of front-running attacks. Consider a stablecoin that values its tokens according to signed timestamped values from a price provider. Then perhaps you could front-run a price increase by submitting a transaction that gets included in "the past" and so gets the lower price even after the new price has been posted.
  • Could it make sense to give IBs a validity interval also? Maybe also EBs? Such that they could only be included into the chain when the RB slot fits in their interval?
    • I think that would ensure that the final sequence of transactions is "possible": giving every transaction the RB slot number would work. The advantage is that you push some of this work down to the construction of IBs and EBs.
    • Of course, the right interval for a IB or EB would be the intersection of all the intervals inside it. This is non-empty (since it can always be the interval consisting of only the current slot), but it might in practice make the block hard to include other places. This makes me think that if we want to rely on the RB slot then it's much better to discard specific transactions rather than entire IBs or EBs. So this is probably a bad idea.

Re pessimistic mode:

  • Why can't the criteria for making a RB that contains transactions directly just be "do the EBs available to me contain more transactions than I could put directly into this transaction?" or something?
  • The existence of pessimistic mode partially undermines the argument in "Fast fork switching": if the chain is in pessimistic mode, then it inherently is running more like Praos, so a long period of pessimistic mode could lead to longer chain switches. Or maybe not, since it also corresponds to lower data throughput.

Re rewards and incentives:

  • Is there an incentive not to include conflicting transactions in an IB? Suppose that I'm rewarded just for constructing an IB, and that there is no penalty for conflicting transactions getting dropped. Then I can very quickly make a 100%-conflicting IB by just copying one that I've seen recently, which is within the inclusion limit. That seems bad. In Praos your block would not get accepted if any of the transactions were not valid so you can't do this.

Re RBs including older IBs. It seems like a decent amount of the complexity in the system comes from the fact that an IB can be included in an RB in a later slot, to allow for the time it takes for the IB to get endorsed. So here's a possibly silly suggestion: allow IBs and EBs to be produced as if the slot was in the future. Then the IB can get endorsed as usual, but the final EB will only get included in the RB when the real slot catches up to it. That means that:

  • RBs can include EBs/IBs only for the current slot, making things much simpler;
  • But you can compensate for the expected latency by making an IB in a future slot, so that it will hopefully have been endorsed sufficiently by the time that the intended slot comes around.

@michael-liesenfelt
Copy link
Contributor

If the transaction throughput is massive, maximizing RB's, EB's, and IB's together as Leios is designed for, then all of the classes of block are [equally?] important for carrying chain throughput.

Previously I did not understand that you purposefully do not want to include transactions in EB and RB blocks. In this implementation transactions will only be included in IB's. I can foresee a situation where the same transaction(s) ends up in 5 to 20 IB's so some form of transaction referencing/pruning would be needed.

Is there any way to implement the VRF so RB's will be produced deterministically every ( slot % RB_Frequency ) interval with potentially only a couple to a few pools as lead? The winner could be determined by the ?lowest? VRF value. Extending this concept further, would it be possible to also implement EB's and IB's VRF allocation so it would be slot-deterministic? Extending the determinism further, would it be wise to use a VRF-derived sequence to control which transactions each IB is allowed to include, and the order of inclusion, instead of using a 'color' scheme? (BTW I agree with @michaelpj 's concept of batching for dependencies, that seems very wise.)

Can you point me to a detailed document to help clarify what I may not understand about the potential for the current VRF ?

Remember that we don't need to specify a whole detailed reward scheme. We can reuse whatever reward scheme is in place at the time Leios development is integrated into Cardano.

True, and ideally CIP-50 fixes the current reward scheme before Leios is ever implemented. Leios has the potential to provide both amazing scaling and amazing network decentralization. However, if the incentives of RB's EB's and IB's are messed up as much as RSSv1 is (MAV 21 not 250), then amazing network performance with a centralized MAV of less than 20 will be the legacy of Leios. If Leios is not designed to increase L1 MAV & decentralization then implementation should be delayed. Maybe Leios should be delayed until after mainnet is operating at MAV > 40, double the current decentralization level.

@Y50000
Copy link

Y50000 commented Nov 28, 2022

I don't know where else to ask this to people knowledgeable in the design, but people where discussing this online, Is the idea to have one pool in the whole entire system be eligible to produce one block every .2 seconds (max) or for a subset of pools (or potentially all) to produce an input block every .2 seconds simultaneously?

@ia-davidpichler
Copy link

I don't know where else to ask this to people knowledgeable in the design, but people where discussing this online, Is the idea to have one pool in the whole entire system be eligible to produce one block every .2 seconds (max) or for a subset of pools (or potentially all) to produce an input block every .2 seconds simultaneously?

The proposed design is for any pool to create a block randomly, such that the average block production rate is 1 block/.2 seconds.

@Y50000
Copy link

Y50000 commented Nov 28, 2022

In the entire network, correct?

@ia-davidpichler
Copy link

It's weighted by stake up the saturation limit for each pool in the network. So pools with no delegation aren't going to be contributing any blocks.

@L-as
Copy link
Contributor

L-as commented Nov 29, 2022

@michaelpj I recommend still using the review function on the PR so that there can be threads (that might be resolved). It's also hard to answer to each point separately when it's one big comment.

@rphair
Copy link
Collaborator

rphair commented Nov 30, 2022

@dcoutts it would be great if you could attend the CIP Editors' Meeting today... perhaps mid-way through if you can't make the whole thing... since this proposal will be introduced in Triage & you might answer some community questions as time permits 🙏 CIP Editors Meeting #58 (Wed, 30 Nov, at 9:30 am UTC)

@KtorZ
Copy link
Member

KtorZ commented Nov 30, 2022

Side-note: the PDF is slightly painful to work with; because we can't inline comments on the PR and it'll be unpractical to render or serve via the website and other mediums consuming CIPs (e.g. the developer portal) which expect only markdown, image or simple text files like json, csv, cddl..

If possible, and given that most of the content in the PDF doesn't really required any LaTeX's features (it's mainly text and images) it'd be nice to turn the document into a plain markdown file (maybe pandoc at the rescue?).

The paper may then evolve into a more elaborate research paper yet I think there's still value in capturing the high-level overview of Leios as it is now in a markdown document; and possibly come back to it later with CIP addressing it bit by bit. Given that the current document is mostly describing the trade-offs, security considerations, motivations and only gives a high-level overview of the solution -- it even sounds like it's better suited as a CPS which will be addressed over time as a succession of CIPs.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 30, 2022

@rphair sorry I couldn't make it today. I can try and make it next time to help answer questions.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 30, 2022

@zygomeb asks:

I'd be really interested in hearing more about why we want to hold elections for Endorsement Report builders and how the statistical argument is made -- as opposed to creating a Mythril threshold signature. Is it the efficiency of it?

I hope I explained in the doc why we want the endorsement certificates at all, which is to allow adopting chains quickly without having to download and validate all the referenced EBs & IBs. And being able to adopt chains within Delta time is important to the security argument for a Praos style chain (which is what the chain of ranking blocks is).

But you ask about why we hold elections for SPOs to be eligible to make reports. It's connected to the statistical argument. The idea is that we do not need to count an actual majority of stake, we just need a high degree of confidence that a majority endorse the block. So it's like opinion polling vs full blown elections: if we sample fairly from the stake distribution, and have enough samples, then we can make a probabilistic argument that there's a majority in support.

So also note that this means we need a lot fewer reports than if we were to do a full blown election style where we simply count enough reports to get a strict majority of stake. So yes there's an efficiency reason there.

And why not Mythril threshold signatures? They are actually very expensive to construct, at least with the current cryptographic techniques. That's ok for making a Mythril snapshot once every few days, but not for EBs every few seconds.

That said, the researchers are investigating ways to aggregate the endorsement reports into a endorsement certificate by aggregating the VRFs and signatures, to make the certificates much more compact, and (as far I understand the crypto -- I'm not the expert) that's closely related to threshold signatures.

Are there any drawbacks to using it? (and is there any possible attack if a cabal of spos share schedules in order to get enough reports to 'certify' an endorsement block)

It should be the case that you'd need a collusion of SPOs representing a majority of the stake to be able to make enough reports to falsely certify an endorsement block. And Ouroboros always relies on the assumption that a majority of stake is honest. If you want to violate that assumption, there's umpteen ways to break the protocol.

Further, have there been any alternate solutions not making use of the Endorsement Reports?

There have been lots of variations on how to do the endorsement reports and signatures. For example should reports be on IBs individually, or on RBs. We settled on introducing a middle kind of block, the EB, to bundle up lots of IBs so that the reports can cover a large set of IBs all at once. But I'm not aware of an major alternative solutions to using endorsement reports and certificates overall (though as I said, Mythril style was considered and found to be impractical for this use case).

Thanks for the great questions.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 30, 2022

@KtorZ of course has a bunch of excellent questions and comments...

Thanks a lot for sharing this @dcoutts. The paper is clear and gives an excellent high-level overview of Leios. I was initially expecting this to go more into detail, but I assume that this will be for later.

Yes, there will be a research paper, and no doubt more design docs with more details. I wrote this summary based on internal drafts from the researchers, but I omitted low level technical details and stuck to prose, since I was trying to reach a broader audience.

I've written down a few questions while going through it:
What about transaction finality, especially from a client application perspective?

Yes that's a very interesting question and one I would have liked to write more about in the document, but unfortunately we're not yet at a stage to make any claims about it.

As you note, ranking blocks are still a Praos chain, so we do at least inherit the Praos analysis of finality (which btw, I understand there may be some improvements on in the analysis). Your intuition about endorsement certificates is exactly the same as my intuition: it does feel like they ought to help with more prompt finality. Unfortunately we just don't have the analysis yet to back that up formally, so we are not yet making any bold claims there. I hope we'll be able to improve on that with more time for the researchers to analyse it properly.

The definition of 'now'?

The proposal currently needs to settle on how to approach time. While giving arguments in favour of two options, is the intention to get this answered before moving forward with the proposal or is it something that will likely be settled when starting the implementation work?

I expect we'll settle it during the design and prototyping phase, well before we get to implementation.

Relationship with Mithril?

You're right that I didn't include anything about Mithril in the resource analysis. I hinted in the final section that if we truly want to run the chain at very high rates that we'd need something like Mithril to be included in the design. We've not yet started to look at how the two could be matched up nicely, so it's not yet clear if Mithril could benefit from the endorsement reports.

Block producers starvation?

From what I understood, there's a certain threshold of endorsement for an Input Block to make it into a Ranking Block.

Yes. There have to be "enough" reports to make a valid endorsement certificate.

This enables (Ranking) block producers to emit blocks containing data they might not have yet seen.

Yes, technically it does. It's unlikely but in principle possible. Unlikely because if the RB producer has waited long enough for all these reports be created and transmitted to it, then it should also be the case that the IBs and EBs that all those reporters saw will also have had enough time to make it to the RB producer.

Yet, it still assumes that Input Blocks are seen and endorsed by a sufficiently large portion of the stake. In a highly concurrent system, I can imagine this causing a starvation issue for the (Ranking) block producers; with a sufficient inflow of different transactions on all network entry points, we may saturate every mempools without reaching a sufficiently high threshold of endorsement to produce any block.

The size of the endorsement report "mempool" (I prefer to call them network buffers :-) ) has to be big enough to cover the peak expected rate of report creation (times how long they remain valid for). So that may be large, but it is limited (by the report VRF). We would indeed not want to have a design where a buffer limit would kill liveness :-)

As it turns out, all of the network buffers except transactions (so IBs, EBs, and reports) are like this: they don't really need a size limit, since they are naturally limited by the VRF that governs their creation rate, and their maximum lifetime. We'd still have a limit, but only for safety/sanity, and we'd make it big enough so that in practice it's never hit (and perhaps we'd handle the limit by dropping the oldest so the new ones can still enter).

There's mention in the design of a "pessimistic" mode as a fallback in case a situation like that shows up. However, it's unclear how we intend to detect that at a local level (the node itself) since this is inherently a global problem. Moreover, how can nodes reach a consensus on establishing a pessimistic mode if they cannot utilize their primary consensus instrument?

It's a function of the recent ranking chain. So it's objective and known to all nodes that are on the same ranking chain.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 30, 2022

@ia-davidpichler asks:

Why have an algorithm to determine running in optimistic or pessimistic mode? Why shouldn't it just always allocate free space in ranking blocks to regular transactions so that if no or few EBs are available transactions can still make it from the mempool into the RB.

We want to keep RBs small. This is to make them quick to distribute over the network, and minimise block height battles. So we don't want to make RBs bigger than necessary.

If one then simply says, within that limit, allocate any free space to txs, well do note that the space available is a tiny fraction of the space available in IBs, if they're being produced at any serious rate. So it's not like we'd be gaining a lot by using that little bit of extra space.

Your suggestion does mean you'd be mixing txs from IBs with txs from the RB directly, which somewhat complicates the question of how to serialise all the transactions. Whereas if it's either EBs or txs directly, but no mix, then that question is simpler. On the other hand your suggestion means there's no need for a predicate on when to use Praos mode vs Leios mode, and that's a complexity saving. On the other hand however, that leaves it as a free choice for the RB producer as to whether to include txs directly rather than EBs (even if they were available). It's not clear that it's good for that to be a free choice for the RB producer. Whereas with an objective predicate, everyone can verify that it's permissible (and indeed required) to use Praos mode rather than Leios mode.


That's my limit for now 😮‍💨

I'll try and get to other people's questions and comments later.

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 30, 2022

@KtorZ I don't think I have the energy to convert to markdown. I do actually make quite a lot of use of LaTeX features. There's a lot of cross references. Some footnotes and math mode. And all those "images" are actually LaTeX TikZ drawings.

But you're welcome to prove me wrong 😁, here's the LaTeX source: https://github.com/input-output-hk/ouroboros-leios/tree/main/report

@dcoutts
Copy link
Contributor Author

dcoutts commented Nov 30, 2022

Oh, I could just include the .tex source in this PR, then people can comment line by line on that. That'd serve the same purpose as commenting on markdown source. Would that do? @KtorZ @michaelpj

@dcoutts
Copy link
Contributor Author

dcoutts commented Dec 13, 2022

Oh and about the VRFs:

Instead of the current single local D-432000 VRF dice being rolled for leaders I'm suggesting 3 local VRF dice: the RB leaders should be rolled with a D-21600 dice, the EB leaders should be rolled with a D-43200 dice, and the IB leaders would be rolled with a D-432000 dice. (with configurable values of course).

I'll try and clarify this in the next revision: yes the proposed design uses independent VRFs for each of the different kinds of block & report. And indeed they each have different configurable thresholds so that the production rates for the different block types & reports can be set independently.

@brouwerQ
Copy link

@dcoutts Great read. What I miss in the paper is some clarification what happens on epoch boundary. Are ranking blocks from epoch X allowed to reference endorsement blocks of epoch X-1 (or lower), and analogously are endorsement blocks of epoch X allowed to reference input blocks of epoch X-1 (or lower)? Or is an epoch a harder boundary and should all those references stay in the same epoch? This decision may or may not have consequences for determining the stake distribution at epoch boundary.

@dcoutts
Copy link
Contributor Author

dcoutts commented Dec 21, 2022

@brouwerQ Yeah, good question. The writeup doesn't answer this or similar questions about the combination of the Leios blockchain consensus algorithm with the current Cardano ledger rules. That's because we start with this somewhat academic approach of thinking about and describing the blockchain algorithm in isolation, abstracting over the details of the ledger, and just trying to specify the necessary things that the ledger must provide.

So I've not considered that question before, but thinking about it I think the answer is that there's no constraint on cross-epoch references between ranking blocks, endorsement blocks or input blocks. Or in other words, it's not a hard boundary.

The stake distribution will be determined by the ledger state at the appropriate ranking block, just as they are with blocks in Praos.

Your question does also remind me that there is a similar question about whether there are any issues with cross-epoch references or transactions when that epoch boundary has a change in some of the protocol parameters (e.g. think about the costs of script execution). That needs more thought and needs to be described properly in a future more detailed version of a Leios design document.

@michael-liesenfelt
Copy link
Contributor

I'll try and clarify this in the next revision: yes the proposed design uses independent VRFs for each of the different kinds of block & report. And indeed they each have different configurable thresholds so that the production rates for the different block types & reports can be set independently.

Happy New Year! I wanted to follow up on this and verify we are all thinking about 'security' the same way. A dedicated DDoS attacker will continuously attack at all OSI layers (both low and high) continuously for hours and hours. Creating blocks with unpredictable cadence on the scale of dozens of seconds is irrelevant to attacks that last hours [or days]. I'm stating that the IB, EB, and RB block cadence should be very periodic, not random, because there is no security advantage to short term seconds of random timing. Random timing of block creation is a legacy artifact of PoW systems that does not have to be forced into PoS systems.

@KtorZ
Copy link
Member

KtorZ commented Jan 17, 2023

@dcoutts It's certainly true that Leios is at an early stage, and it's also very large. If you think it'd fit a CPS better, we can recast it as one. I'd appreciate your advice on that as it's new and I'm not familiar with the process.

@dcoutts And I'm not totally sure about the idea of it turning into a bunch of CIPs, at least not ones that are independent with independent implementations. A new consensus algorithm is inherently quite chunky and not necessarily all that modular. It's not likely to be a bit-by-bit implementation. If you just mean a bunch of CIPs as documentation of a large thing (like one could imagine all the bits of the Shelley spec, ledger, network etc being informational CIPs) then that makes sense.

I think there's room for a CPS in the sense that the root of the problem is rather complicated, and has several non-obious design goals and non-goals. In particular, the current sections 1, 2, 3 of the documents directly falls into the CPS structure. However, it is true that the rest of the document is indeed a high-level overview of the solution, which calls for a CIP. So you may decide to split the problem definition apart in a separate document, of leave it as part of this one. I think it's fine either way. We would require to split as a separate CPS if there were multiple competing solutions to that problem but, as far as I know, this isn't the case.

Regarding my second remark, it hint to the fact that the document is not as detailed as one would need to implement it -- which is the end goals of CIPs in the end: be detailed enough to allow concurrent implementations to happen. However, given the magnitude of the work,I don't think the entire engineering design would fit in a single document. Which is why I suggested to start with this one as a high-level overview, already sufficient to get a good grasp on the direction it'll take and also sufficient to trigger a whole bunch of interesting discussions. And then, as the implementation progresses and as the design is refined; introduce new CIPs to align on interfaces and design decisions that are relevant to understand how the solution is built.

One thing I often find regrettable in the Cardano internal ecosystem is how some things are kept undocumented and merely designed as Haskell code. For example, the local-state-protocol still has no .cddl definition whatsoever for any of the state queries. One has to find the Haskell source code to figure out how each message of the protocol is actually serialized -- which isn't trivial given that queries are implemented with a mix of type-classes and type family instances spanning over 3 repositories. Same things for the Plutus binary interface (i.e. binary representation of the ScriptContext) between the ledger and the Plutus-core virtual machine.

So at the very least, I expect interfaces to be proposed as CIPs and fully documented; as well as important technical considerations.

@pagio
Copy link

pagio commented Feb 2, 2023

@michael-liesenfelt

There actually is a security advantage from a theory point of view. If the schedule is periodic and non-random, i.e, it is known at the start of the epoch, it is potentially easier for an attacker to launch targeted attacks to block producing SPOs to break the security of the system, compared to not knowing who will produce the next block. The problem gets more serious the more "flat" the stake distribution is. I agree that if we are thinking of a few SPOs that control the majority of the stake then we are not getting any additional security. However, it is also important that we design a system that can scale without hurting security.

@michael-liesenfelt
Copy link
Contributor

michael-liesenfelt commented Feb 2, 2023

@pagio :

it is potentially easier for an attacker to launch targeted attacks to block producing SPOs to break the security of the system, compared to not knowing who will produce the next block.

(Secret block producer identity) and (random vs periodic block production timing cadence) are two different things! You need to go back and re-read what I posted, because you're stuck in the original Ouroboros mental paradigm repeating yourself. Random timing of block creation is a legacy artifact of PoW systems that does not have to be forced into PoS systems.

@pagio :

However, it is also important that we design a system that can scale without hurting security.

I'm giving you a solution that preserves secrecy security while simultaneously improving scalability via deterministic cadence!

Entropic cadence to emulate PoW adds no security and will hurt scability relative to periodic cadence. With 2-3 secret leaders for each period of the periodic cadence the entire Cardano network could theoretically lose half of block producers and relays without dropping the block production cadence, a very desirable and necessary property for the reliability of a global financial operating system. This situation literally just happened to mainnet! Random 2-3 minute gaps of no ranking block production every month is dumb.

@pagio
Copy link

pagio commented Feb 2, 2023

@michael-liesenfelt Maybe I'm "stuck". Can you please clarify what do you mean by deterministic cadence and how would you implement it so that I get unstuck? ( I understood a deterministic block production schedule that is released say at the beginning of each epoch)

@michael-liesenfelt
Copy link
Contributor

how would you implement it so that I get unstuck?

Currently:
if pool_vrf_value(slot) < (pool_stake / total_stake)*(21600/432000) make_a_block_during(slot)

This results in entropic cadence of block production. Any slot could have a block created.

I'm highly suggesting:
If pool_vrf_value( (slot * 21600)/432000 ) < (pool_stake/total_stake) make_a_block_during( slot )

This is periodic cadence. Each block producing slot will happen periodically every 20 slots. The network will have had plenty of time to propagate the block(s) and settle a potential battle.

A slight tweak would guarantee battles, and also guarantee block producer redundancy. For a 50% overall network redundancy:

If pool_vrf_value( (slot * 21600)/432000 ) < ( 1.0/50% "desired_leaders_per_block")(pool_stake/total_stake) make_a_block_during( slot )

@Quantumplation
Copy link
Contributor

@michael-liesenfelt Some comments on your proposal, with simulation;

First, since I didn't feel like reimplementing the pool_vrf_value function from scratch, I'm relying on the fact that pool_vrf_value is uniformly distributed between 0 and 1, but deterministic given the same slot.

Given that assumption, I can think of three ways to interpret pool_vrf_value((slot * 21600)/432000):

  • This is using integer division, meaning it rounds to the nearest integer, in which case your algorithm would have a pool producing a block every second for 20 seconds;
  • Alternatively, it's fractional division, meaning each slot produces a unique value; in which case I don't believe it has any effect: the VRF value is uniform over it's input, so rescaling it for everyone has no effect
  • Finally, maybe we are assuming that pool_vrf_value is 1 on fractional values, or equivalently, we only evaluate this on regular intervals such as every 20 slots.

Given your other comments, I think the last one is your intention, but let me know if I've missed something.

I implemented a toy example of your first proposal, compared to the current implementation, here:
https://glot.io/snippets/ghyk557yp8

By simulation, it seems this actually has the effect of increasing the time between blocks on average, while only slightly decreasing the variance in slot times; Every slot now happens on a defined boundary (20, 40, 80, etc.), but there's still almost as much variance among these times, with large gaps between blocks. Additionally, this results in a dramatic increase in slot battles (from 2% to 50%). This means a number of real world downsides:

  • More energy consumption by the network, as nodes spend more time producing and gossiping about blocks that ultimately get discarded
  • A higher frequency of forks, meaning there's more uncertainty around whether a block is confirmed, and thus longer finality times for the end user
  • The number and length of these forks also plays directly into the security argument of proof of stake in ways I'm not equipped to evaluate, but I can't imagine it's a good thing :)

So at the very least such a change would have to be considered very very carefully.

It's possible these drawbacks can be overcome by tweaking parameters. For example, maybe you can reduce the frequency to 15 blocks so the average block time remains consistent, and tune other parameters to strengthen the security argument. Given the effort that that entails, it'd probably need a very compelling reason for why slightly more regular / well defined slot boundaries is beneficial. Maybe the variance results in some minor operational complexity for some use cases, as you can't make assumptions about when to expect a block, but I'm not seeing a very strong argument for why it would be worth the effort needed to re-tune and re-analyze the security of the protocol given the above.

All that being said, it also seems outside the scope of this particular proposal. Maybe it'd be worthwhile to flesh out the argument in favor of this change as a separate CIP, along with a more rigorous simulation? It's possible that some of the simplifying assumptions I made in my quick sketch are undermining your argument needlessly.

@Quantumplation
Copy link
Contributor

(also, a bit off topic, but I just want to take a moment to celebrate this work/discussion happening in the open. I know some design work for previous eras has happened on GitHub, but for whatever reason as an outside observer this feels a lot more open and participatory, which feels like a positive evolution to me.)

@michael-liesenfelt
Copy link
Contributor

michael-liesenfelt commented Feb 6, 2023

@Quantumplation : I implemented a toy example of your first proposal, compared to the current implementation, here:
https://glot.io/snippets/ghyk557yp8

I changed it a bit, adding a full cadence histogram:
https://glot.io/snippets/ghyvymlt7x

VRF w/variable cadence, chain density 0.05: 
Total Nobattle Blocks :  20347
Total Blocks :  20784
Cadence Histogram :  [
  437, 964, 950, 853, 835, 798, 763, 790, 709, 649, 623, 619,
  549, 550, 509, 450, 479, 423, 439, 381, 373, 366, 354, 354,
  308, 296, 285, 247, 259, 283, 216, 242, 208, 209, 204, 157,
  175, 162, 181, 149, 148, 121, 117, 136, 114, 106,  99, 106,
  114,  88,  84,  82,  98,  92,  67,  65,  79,  68,  67,  63,
   48,  39,  45,  39,  45,  43,  37,  45,  38,  35,  39,  22,
   25,  29,  21,  28,  17,  19,  27,  21,  20,  22,  17,  28,
   21,  18,  11,  20,  13,  10, 230
]
 
VRF w/o variable cadence, 13s periodic block timing:
Total Nobattle Blocks :  13425
Total Blocks :  22103
Cadence Histogram :  [
  8678,    0,    0,    0,   0,   0,  0, 0, 0, 0, 0, 0,
     0, 8884,    0,    0,   0,   0,  0, 0, 0, 0, 0, 0,
     0,    0, 3026,    0,   0,   0,  0, 0, 0, 0, 0, 0,
     0,    0,    0, 1032,   0,   0,  0, 0, 0, 0, 0, 0,
     0,    0,    0,    0, 316,   0,  0, 0, 0, 0, 0, 0,
     0,    0,    0,    0,   0, 113,  0, 0, 0, 0, 0, 0,
     0,    0,    0,    0,   0,   0, 35, 0, 0, 0, 0, 0,
     0,    0,    0,    0,   0,   0, 19
]

Battles aren't a bad thing, especially with most of a periodic interval to determine a winner. More battles combined with tie going to the smaller pool (Issue 4051) would benefit decentralization. I also want to point out the diagram in the CIP79 description shows all the blocks nice and evenly spaced as if they are intended to be periodic.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 21, 2023

@michael-liesenfelt just my 2c: I understand your intention is to have a deterministic schedule but also to keep the schedule private. That sounds nice, but I'm not aware of any method to do it. If you've discovered such a mechanism then that's great and it would be worth looking at. It would need to be clearly documented in its own right.

Random timing of block creation is a legacy artifact of PoW systems that does not have to be forced into PoS systems.

I'd say it's not a legacy, it's the nature of the beast: it's easy to have a deterministic public schedule for PoS, and easy (ish) to have a non-deterministic private schedule for PoS, but having both is hard.

From my (probably imperfect) understanding of your proposal, you're making slots be 20s (rather than 1s) and then for each slot you still using VRFs but you tune it so that you end up with a very high chance of there being a slot leader in each slot, but also a high chance of multiple leaders in each slot.

If that's accurate then unfortunately it will not play nicely with Ouroboros. Ouroboros Praos (and the Leios ranking blocks) relies on there being not too high a chance of multiple slot leaders, otherwise the security argument falls apart. Intuitively, you'd have far too many "natural" forks in your setting and once you add in the actions of the adversary (who takes advantage of forks and tries to extend their own) it all goes wrong.

Note that the ranking blocks are special in this regard. For the input and endorsement blocks, and indeed votes, it's fine to have multiple leaders in the same slot.

@michael-liesenfelt
Copy link
Contributor

it's the nature of the beast: it's easy to have a deterministic public schedule for PoS, and easy (ish) to have a non-deterministic private schedule for PoS, but having both is hard.

True, I yield on that. I'm essentially recommending a hybrid, retaining private schedule while only reducing/discretizing (but not eliminating) interval/cadence entropy.

From my (probably imperfect) understanding of your proposal, you're making slots be 20s (rather than 1s) and then for each slot you still using VRFs but you tune it so that you end up with a very high chance of there being a slot leader in each slot, but also a high chance of multiple leaders in each slot.

No. Slots would be 1.0s OR the smallest possible time interval in the multi-tier RB/EB/IB scheme. If your reference "leios-design.pdf" on Page 10, Table 1: Input Blocks , Frequency = 1 per 0.2s to 2.0s it seems that you are already considering 0.2s slot times (or as you mention there would be multiple valid IB/EB blocks per slot, but not multiple RB winners per slot).

If your reference "leios-design.pdf" on Page 13, Figure 5 was created with perfect time spacing between blocks in each tier. I'm just pointing out that I like that YOUR design, agree with that timing cadence in shown in Figure 5, and highly recommend we should enforce that with integer RB/EB/IB slot cadence parameters instead of probabilistic entropic RB/EB/IB_chain_density parameters that is wrapped up into the VRF. Predictable cadence is a desirable property for a global financial operating system. A high chance (from the 13s cadence example above: 8678/22103 = 39%) of multiple leaders in each slot is not a terrible property with a guaranteed 13.0s interval for resolution. Dishonest [late] actors are highly likely to lose battles on a smaller chain fork relative to an honest [timing] majority.

I'm not aware of any method to do it. If you've discovered such a mechanism then that's great and it would be worth looking at. It would need to be clearly documented in its own right.

It would also require an updated Ouroborus paper likely.

Find "multiply by chain_density into the VRF", and then imagine eliminating it while only checking the VRF for eligible slots of each tier. I'm recommending not checking for valid VRFs for every slot of every tier of Leios, only eligible slots.

if ( slot % RB_cadence_parameter == 0 ) then the slot is eligible for an RB block, otherwise it's not.
if ( slot % EB_cadence_parameter == 0 ) then the slot is eligible for an EB block, otherwise it's not.
if ( slot % IB_cadence_parameter == 0 ) then the slot is eligible for an IB block, otherwise it's not.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 24, 2023

Figure 5 was created with perfect time spacing between blocks in each tier. I'm just pointing out that I like that YOUR design, agree with that timing cadence in shown in Figure 5

I did the diagram that way for clarity, but it does not reflect the timing in the design.

Predictable cadence is a desirable property for a global financial operating system.

I don't dispute this, I'm just not clear how one would do it (while retaining privacy). And as far as I've understood your proposal, it doesn't do it either, unfortunately.

Your proposal would have us only consider making a ranking block on every 20th slot (for example). But if your design is to use a VRF in those slots to decide who makes the block, then you still have a problem: there is no good way to tune that VRF. The VRFs are random, with every potential slot leader doing their own random (2^256 sided!) dice roll. If you tune the VRF threshold so that the probability that at least one potential slot leader gets over the threshold is nearly 100% then it also means the probability of two or more slot leaders is also correspondingly high. That's a problem for the chain security. But if you tune it so the probability is low then you're back to the random cadence that we have now.

A high chance (from the 13s cadence example above: 8678/22103 = 39%) of multiple leaders in each slot is not a terrible property with a guaranteed 13.0s interval for resolution.

Unfortunately that's just not how Ouroboros security works. It is a problem to have that high a fraction of multiple slot leaders. Ouroboros chain security would fail in those circumstances.

Input Blocks , Frequency = 1 per 0.2s to 2.0s it seems that you are already considering 0.2s slot times (or as you mention there would be multiple valid IB/EB blocks per slot, but not multiple RB winners per slot).

It's worth remembering that IBs and EBs are fundamentally different to RBs when it comes to the VRF tuning. It is perfectly ok to have multiple slot leaders for IBs or EBs. It is not ok to have multiple slot leaders for RBs too often (e.g. 5% is ok, 25% is not ok).

The example of 1 IB per 0.2s could be achieved with 1 second slots, by tuning the VRF to produce on average 5 slot leaders in every slot. That's 5 IBs per second, so a frequency of 1 per 0.2s.

@michael-liesenfelt
Copy link
Contributor

@dcoutts : It is perfectly ok to have multiple slot leaders for IBs or EBs.

I agree on this.

I did the diagram that way for clarity, but it does not reflect the timing in the design.

I still like the cadence implied by the diagram!

If you tune the VRF threshold so that the probability that at least one potential slot leader gets over the threshold is nearly 100% then it also means the probability of two or more slot leaders is also correspondingly high.

I'm not recommending that [ P( >0 slot leader ) > 100% ].
The blocks per epoch of an untuned 13s periodic cadence scheme is roughly equivalent to chain_density = 5%.

We've already gone over this: see the cadence histograms above. It's clear that the "VRF w/o variable cadence, 13s periodic block timing" has approximately the same blocks per epoch with a much much smaller amount of >90s chain delays per epoch ( 19 vs 230 ). The tradeoff of a cadence scheme is nearly four times the fraction of multiple slot leaders for far less >90s delays.

It is a problem to have that high a fraction of multiple slot leaders.

I don't understand the 'problem' or the 'security' weakness because a cadence guarantees plenty of resolution time and VRF winners are deterministic. It is clear that the probability of two or more slot leaders in a 13s cadence scheme is ~39% instead of [approximately-ish 3 second worst case] sum( 437, 964, 950 ) / 21600 = ~10.8% for a 5% chain density scheme.

It is not ok to have multiple slot leaders for RBs too often (e.g. 5% is ok, 25% is not ok).

What's the 'too often' cutoff? I think cadence corrects this and allows for timely deterministic battle resolution.

how one would do it (while retaining privacy)

We have already been over this, you would use the VRF to retain privacy, just not VRF( chain_density & pool_key & slot ) every single slot.

We have a disconnect here:

  • Your view: VRF every ( slot % 1 ) and a chain_density tuning parameter.
  • My view: VRF every ( slot % RB_cadence_parameter ) without a tuning parameter.
  • Another view: VRF every ( slot % RB_cadence_parameter ) and a tuning parameter.

Let's see what cadence and a tuning parameter would look like:
A mandatory 5 second cadence with a tuning parameter of 0.30:

VRF every slot, variable cadence, chain_density 0.05: 
Total Nobattle Blocks :  20775
Total Blocks :  21203
Cadence Histogram :  [
  428, 1060, 1003, 899, 928, 834, 790, 734, 705, 705, 646,
  608,  581,  533, 515, 481, 439, 430, 484, 446, 369, 387,
  347,  308,  336, 328, 292, 269, 281, 255, 222, 228, 196,
  219,  192,  187, 165, 165, 152, 141, 119, 140, 116, 126,
  119,  122,  106,  91, 102,  91,  86,  83,  71,  76,  62,
   76,   55,   50,  40,  62,  56,  67,  51,  46,  46,  39,
   32,   41,   37,  37,  38,  25,  26,  39,  21,  28,  27,
   25,   29,   22,  14,  13,  15,  22,  15,  16,  15,  14,
   15,   10,  241
]
 
5 second cadence with 0.30 tuning parameter:
Total Nobattle Blocks :  19995
Total Blocks :  22772
Cadence Histogram :  [
  2777,    0,   0,    0,   0, 5281,    0,   0,    0,   0, 3911,   0,
     0,    0,   0, 2810,   0,    0,    0,   0, 2127,   0,    0,   0,
     0, 1531,   0,    0,   0,    0, 1158,   0,    0,   0,    0, 837,
     0,    0,   0,    0, 625,    0,    0,   0,    0, 477,    0,   0,
     0,    0, 320,    0,   0,    0,    0, 226,    0,   0,    0,   0,
   186,    0,   0,    0,   0,  143,    0,   0,    0,   0,   87,   0,
     0,    0,   0,   70,   0,    0,    0,   0,   51,   0,    0,   0,
     0,   44,   0,    0,   0,    0,  111
]

I'm changing my mind.
Now I like the combination of both a RB tuning parameter and an RB cadence parameter. Combining both parameters could drop the 'tail' of >40 second RB delays significantly while "making the chain run like clockwork".

@TerminadaPool
Copy link

TerminadaPool commented Feb 25, 2023

The Ouroboros Praos paper places great importance on active vs passive slots being randomly distributed. Here is a quote from the paper:

"In more detail, we analyze our protocol in the partial or semi-synchronous model.
...
In order to cope with the ∆-semisynchronous setting we introduce the concept of “empty slots” which occur with sufficient frequency to enable short periods of silence that facilitate synchronization. This feature of the protocol gives also its moniker, “Praos”, meaning “mellow”, or “gentle”.

There are several other areas in paper where it is mentioned that randomly distributed periods of silence (multiple sequential inactive slots) are important. If ranking blocks (RBs) were to have a perfectly regular, predictable, occurrence, that would be going against this design, which seems to have been very fundamental in the thinking of the Ouroboros architects.

I would like to properly understand the security significance of this feature before deviating? Especially since the architects named the version of the protocol after this feature.

@michael-liesenfelt
Copy link
Contributor

michael-liesenfelt commented Feb 25, 2023

@TerminadaPool : There are several other areas in paper where it is mentioned that randomly distributed periods of silence (multiple sequential inactive slots) are important.

I agree!
I agree so much that I'm recommending enforcing a better (IMHO) 'gentle silence' on a regular cadence instead of leaving it to probabilistic chance silence. (Author use of 'silence' is no new blocks while nodes communicate sync existing blocks.)

Leios creates a three tier linked 'spatial' structure.
Cadence is structured silence in frequency domain.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 28, 2023

@michael-liesenfelt I'm afraid I've really been struggling to understand what precisely your proposal actually is. I'll try writing it down again, but if I'm wrong please do give us a clear precise description.

I think what you're proposing is that every potential slot leader only evaluates their leader VRF every X slots, e.g 13 or 20 slots or whatever. That VRF is tuned so that there is a high chance of at least one slot leader, and thus a moderate chance of multiple slot leaders. Your assumption is that multiple slot leaders are ok because nodes would also use another VRF to resolve between equal length chains.

If that description is correct, then what I keep trying to explain is that it is not ok for the Praos algorithm (and thus Leios RBs) to have such a high probability of multiple slot leaders in each (eligible) slot. That's not how the security of Praos works. You might say "well what if it were, would it work?" to which I have to say, I don't know, you'd have to redo the security analysis. But my guess would be no.

Note that the deterministic resolution of equal length chains based on a VRF is not part of the Praos paper, and is thus not part of the security analysis. The VRF based resolution is allowed by Praos because Praos specifies that we may resolve arbitrarily between equal length chains (that's arbitrary, not random).

@michael-liesenfelt
Copy link
Contributor

Honestly, I'm having difficulty here...
in the Praos paper there are statements like:

"a party with relative stake α becomes a slot leader for a given slot with probability φf (α) , 1 − (1 − f)^α"

There isn't any chart, histogram, or CDF in Praos Paper that indicates adversarial advantage fraction relative to adversary stake fraction at given network RMS packet delay. There is no chart/histogram of the opportunities for chain alterations via selfish mining per epoch relative to stake size. I'm at least trying to show single trial non-adversarial snapshots of the histogram/timing/battle/forking properties above.


"it is not ok for the Praos algorithm (and thus Leios RBs) to have such a high probability of multiple slot leaders in each eligible) slot."

There's no chart or indication in Praos Paper of when consensus is OK versus not OK relative to multiple slot leader fraction.


"Protocol πSPoS"
") Ui collects all valid chains received via diffusion into a set C, pruning blocks belonging to future slots"

so... Praos Paper is literally declaring that you prune future slots as part of the protocol, but you cannot selectively prune 4/5 or 12/13 or 19/20 [future/past] slots to make a cadence scheme real. Can you understand why I'm skeptical?

@PhilippeLeLong
Copy link

@michael-liesenfelt Am I understanding your proposal correctly? What you're saying is, tune VRF so that it's very certain that at least one pool will win each slot. Then have the protocol only recognize each 20th slot eligible for carrying a block. Add some selection method to decide between equal length chains.

@dcoutts
Copy link
Contributor Author

dcoutts commented Apr 24, 2023

Unfortunately that just isn't how Praos works. The problem with that is that if you tune the VRF so that it's very certain that at least one pool will win each slot, then the probability you'll get more than one is also quite high (which isn't ok). Also, you get no periods of "quiet" when there are no slot leaders. Praos also relies on this.

Declaring 19 out of 20 slots ineligible doesn't really change things. For the probability calculations it's much the same as having slots that are 20 sec long.

Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcoutts I'm checking this one because there was another Consensus related CIP just drafted, with a need for a new Consensus CIP category confirmed here: #872 (comment) (cc @bwbush)

Is your team going to follow through with finalising this proposal? If so, can you post an update about what you are working on and/or waiting for, with a possible timeline?

Note if this is going ahead then it needs a small format rewrite (header data and correction / organisation of major section) to update it for the new CIP-0001 as per all the merged CIPs in #389 (I can submit a review to help with this if necessary).

Comment on lines +2 to +4
CIP: 79
Title: Implement Ouroboros Leios to increase Cardano throughput
Authors: Duncan Coutts <duncan.coutts@iohk.io>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CIP: 79
Title: Implement Ouroboros Leios to increase Cardano throughput
Authors: Duncan Coutts <duncan.coutts@iohk.io>
CIP: 79
Title: Implement Ouroboros Leios to increase Cardano throughput
Authors: Duncan Coutts <duncan.coutts@iohk.io>
Category: Consensus

(among other adjustments to the header)

@rphair rphair added the State: Waiting for Author Proposal showing lack of documented progress by authors. label Aug 5, 2024
@rphair
Copy link
Collaborator

rphair commented Aug 19, 2024

@dcoutts (cc @abailly-iohk since you have offered a contemporary view on some related work) the plans for this as a protocol may well be proceeding, but I have to mark this as Abandoned because it's apparently been abandoned as a CIP.

We would all love a proper CIP to be defined for this, with the document proceeding only when the team working on this feels ready, but the editors & community need to know what we are waiting for even if this were to stay indefinitely at the Waiting for Author state... if & when we get an update we can change the state back accordingly. cc @Ryun1 @Crypto2099

@rphair rphair added State: Likely Abandoned Close if confirmed abandoned (long waiting). and removed State: Waiting for Author Proposal showing lack of documented progress by authors. labels Aug 19, 2024
@abailly-iohk
Copy link

@rphair That's totally fair and a good move. We are in the process of finalizing a CIP for Peras (#872) base on prototyping and R&D work, and Leios should follow suit.

@rphair
Copy link
Collaborator

rphair commented Aug 26, 2024

thanks @abailly-iohk ... correct me I'm wrong but that sounds like a new CIP will be coming for Leois that will reflect updates & actual work on this effort in the 2 intervening years. If that will ever be applied to the document in this PR then please post as such and I'll reopen.

@rphair rphair closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
State: Likely Abandoned Close if confirmed abandoned (long waiting).
Projects
None yet
Development

Successfully merging this pull request may close these issues.