NEP-509: Stateless validation stage 0 #509

walnut-the-cat · 2023-09-19T22:02:45Z

WIP

initial draft

render · 2023-09-19T22:02:57Z

Your Render PR Server URL is https://nomicon-pr-509.onrender.com.

Follow its progress at https://dashboard.render.com/static/srv-ck51l1o21fec73aapqgg.

Add specifications

Add validator role change section

frol · 2023-11-01T19:21:37Z

Hi @walnut-the-cat – thank you for starting this proposal. As the moderator, I labeled this PR as "Needs author revision" because we assume you are still working on it since you submitted it in "Draft" mode.

Please ping the @near/nep-moderators once you are ready for us to review it. We will review it again in early January, unless we hear from you sooner. We typically close NEPs that are inactive for more than two months, so please let us know if you need more time.

fix lint errors

neps/nep-0509.md

Documenting changes to validators and describing basics of reference implementation.

In near/nearcore#11582 we're increasing `combined_transactions_size_limit`, so let's update the NEP to match the implementation.

bowenwang1996 · 2024-06-17T01:53:00Z

As a working group member, I'd like to nominate @mfornet and @birchmd as SME reviewers for this NEP.

birchmd

As SME and working group member, I lean towards approving the NEP. It is exciting to see Near continue to push towards being a completely scalable protocol. Avoiding the complexity of fraud proofs while having an eye towards using ZK technology in the future is very clever. Thanks to everyone for their hard work on designing and implementing this large protocol change.

neps/nep-0509.md

fix lint warning and apply Michael's suggestion

birchmd · 2024-06-26T19:57:25Z

@mm-near The "40 years at 90% confidence" calculation was done by me.

It assumes that the attacker has just barely less than 1/3 of the total stake (so they cannot outright take over the protocol), which is about 197 million $NEAR as of today.

The calculation determines the probability of a shard assignment (recall that stake is converted to "mandates" and these are randomly assigned to shards) in which at least one shard has 2/3 of its assigned stake controlled by the attacker. In that case the attacker would be free to push an invalid state transition because it could sign the invalid state witness itself. With 68 mandates per shard and 6 shards total this probability is 8.6e-10.

Then we assume the shard assignments are independent so that we can model it as a Bernoulli process and see how many "trials" it would take before we have a "success" (i.e. how many random shard assignments are there before the attacker obtains a 2/3 majority in one shard). The probability of having m "failures" in a row in a Bernoulli process is (1-p)^m and we want that to happen with 90% confidence (a somewhat arbitrary value chosen by me), so we can have m = ln(0.9)/ln(1-p) trials. This works out to be around 122 million trials.

Now that we know the number of trials we can convert it into a time. At 1 trial per second that is almost 4 years, but at the time Bowen was suggesting to shuffle less often than every block. At 1 trial per 10 seconds we get almost 40 years, which is the number I reported.

We can also do this calculation the other way though. If we take the 5 year timeline you propose, then we can convert that into a number of trials. Let's assume one trial per second since I think the current implementation does shuffle validators every block. Then that is around 157 million trials and we want to know in our Bernoulli process what is probability of having at least 1 success within that many trials. This probability is 1 minus the probability that we have all those trials fail in a row, so 1 - (1-p)^N. This works out to be around 12.7%. So if someone controlled 197 million $NEAR staked Near for five years then there is a 12.7% chance that they would have the opportunity to push an invalid state transition. If we instead assume only 1 trial every 10 seconds then this probability reduces down to around 1.3%.

If you keep the number of mandates per shard the same then this whole calculation does not change much as you increase the number of shards because the theory says that the dependency on the number shards is not very strong after you have more than a few. So the base probability of 8.6e-10 should stay close to the same for any number of shards. But note that increasing the number of shards while keeping the number of mandates per shard the same means increasing the total number of mandates.

victorchimakanu · 2024-06-27T15:31:00Z

NEP Status (Updated by NEP Moderators)

Status: VOTING

SME reviews:

Protocol SME @birchmd : Stateless validation stage NEP-509: Stateless validation stage 0 #509
Protocol SME @mfornet Stateless validation stage #509

Protocol Work Group voting indications (❔ | 👍 | 👎 ):

pugachAG · 2024-07-02T16:41:50Z

@mm-near

Do we have info on how big the state witnesses are going to be on average ? (based on current traffic patterns)

Please find the metrics based on the current mainnet traffic for a window of 12 hours.

max witness size

Max witness size affects chunk validation latency.

avg witness size

Avg witness size determines additional chunk validation network usage.

# Feature to stabilize This PR stabilizes the Congestion Control and Stateless Validation protocol features. They are assigned separate protocol features and the protocol upgrades should be scheduled separately. # Context * near/NEPs#539 * near/NEPs#509 # Testing and QA Those features are well covered in unit, integration and end to end tests and were extensively tested in forknet and statelessnet. # Checklist - [x] Link to nightly nayduck run (`./scripts/nayduck.py`, [docs](https://github.com/near/nearcore/blob/master/nightly/README.md#scheduling-a-run)): https://nayduck.nearone.org/ - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.

Longarithm · 2024-07-02T19:59:57Z

@mm-near the latency you mentioned matches existing one.

Before: BP sends block quickly on receiving chunks, but block is validated only after other block producers apply all its chunks - it was their only way to validate chunks in block. So the next block production happens only after previous chunks were applied.
After: BP, additionally, has to wait for endorsements from CVs. But it is equivalent to waiting on applying previous chunks, it is just performed by CVs based on state witness now. After that block is quickly validated by endorsement signatures verification.

Also, BP&CPs are also CVs, so stake on chunk validation remains big. Memtrie is much faster than disk trie, which compensates network latencies for sending state witnesses and endorsements.

UPD: the actual additional latency is introduced on chunk producer side: near/nearcore#10584

Shortly: to produce chunk N, CP must apply chunk N-1, for which BP must produce block N-1, for which CVs must validate ( = apply) chunk N-1. So applying of chunk N-1 appears twice.
But again, we expect that speedup in chunk application outweighs that.

Side notes:

If BP tracks shard, it will apply chunks for it, but it doesn't block receiving other blocks.
Latency of user waiting for transaction outcome shouldn't change.

Let's say only one shard is touched by transaction. To get outcome, we query the RPC node which tracks touched shard.
If RPC node tracks shard, it applies chunks from blocks immediately without waiting for endorsements - because chunk application is deterministic.
Chunks are validated on chain by endorsements in the next block with chunk, but if user is optimistic, they can just rely on RPC node’ response.

# Feature to stabilize This PR stabilizes the Congestion Control and Stateless Validation protocol features. They are assigned separate protocol features and the protocol upgrades should be scheduled separately. # Context * near/NEPs#539 * near/NEPs#509 # Testing and QA Those features are well covered in unit, integration and end to end tests and were extensively tested in forknet and statelessnet. # Checklist - [x] Link to nightly nayduck run (`./scripts/nayduck.py`, [docs](https://github.com/near/nearcore/blob/master/nightly/README.md#scheduling-a-run)): https://nayduck.nearone.org/ - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.

shreyan-gupta · 2024-07-04T19:55:45Z

@mm-near

for Reed Solomon Erasure encoding - do we still plan to send it to all the block producers (for all the shards?)

The main purpose of the Reed Solomon Erasure encoding for state witness is to reduce the load on the chunk producer for distributing the state witness. The recipients of the state witness are all the chunk validators, and they are the ones who participate in the partial witness forward and not block producers.

This way we don't put too much network load on the block producers and the network load is localized to the chunk validators. Nodes that have higher number of mandates are validators for multiple shards.

mfornet

As a Protocol WG member, I lean towards approving this proposal since it is a necessary step towards effective sharding.

My main concern is concerning chunk validators:

In this approach, I'm concerned with chunk validators' incentives to validate new chunks. As I understand from this document, the optimal strategy for individual chunk validators is to accept every chunk. As long as there is one honest chunk validator, work is not needed, and they don't get penalized for incorrectly endorsing an invalid chunk.

mfornet · 2024-07-05T13:42:29Z

neps/nep-0509.md

+### Assumptions
+
+* Not more than 1/3 of validators (by stake) is corrupted.
+* In memory trie is enabled - [REF](https://docs.google.com/document/d/1_X2z6CZbIsL68PiFvyrasjRdvKA_uucyIaDURziiH2U/edit?usp=sharing)


Should we move the content of the linked document to neps/assets in this repository, in case the current link gets broken for some reason?

neps/nep-0509.md

mfornet · 2024-07-05T14:12:29Z

neps/nep-0509.md

+As we pointed out above, current formula `chunk_validator_quality_ratio` is problematic.
+Here it brings even a bigger issue: if chunk producers don't produce chunks, chunk validators will be kicked out as well, which impacts network stability.
+This is another reason to come up with the better formula.  


Chunk validators can collude and not endorse some chunks in a way that some chunk producers or other chunk validators get kicked out by not getting their chunks included.

But this is more relevant to the chunk endorsement process, not chunk validator kickouts/rewards.

And the base assumption of new approach is to make event "1/3 validators of chunk collude" mean that "1/3 of all validators collude" with high probability, so in this case the base blockchain security assumption fails, on which we rely on.

neps/nep-0509.md

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Longarithm · 2024-07-08T19:19:12Z

@mfornet answered to chunk validator-related comments.

Yeah, this is a known problem. We discussed it couple times. One idea was to introduce "honeypot state witnesses", the goal of which would be to verify that state witnesses can get invalidated, and penalise validators for blind approvals.

However, the counterarguments are that

we have other places in the consensus where blind approval is not penalised at all - e.g. nothing prevents block validators to endorse all blocks
for small chunk validators effect is negligible; for bigger stake on blind approvals the situation is effectively the same as many validators colluding, which we can't control anyway.

So any of these solutions would introduce additional complexity (which is already very substantial) and the benefit didn't become clear.

lint error

flmel · 2024-07-26T14:52:33Z

Thank you to everyone who attended the Protocol Work Group meeting! The working group members reviewed the NEP and reached the following consensus:

Status: Approved (Meeting Recording: https://youtu.be/058BZEyXzgU)

@walnut-the-cat Thank you for authoring this NEP

@birchmd @mfornet Thank you for the review!

Create nep-0509.md

b3ae8e5

initial draft

walnut-the-cat added 3 commits September 20, 2023 07:37

Update nep-0509.md

867d414

Update nep-0509.md

a1c36b9

Update nep-0509.md

abbf6f0

Add specifications

This was referenced Sep 27, 2023

Stateless Validation NEP Draft near/nearcore#9587

Open

🔷 Prototype stateless validation near/nearcore#9292

Closed

Update nep-0509.md

229d1e2

Add validator role change section

frol added WG-protocol Protocol Standards Work Group should be accountable S-draft/needs-author-revision A NEP in the DRAFT stage that needs an author revision. A-NEP A NEAR Enhancement Proposal (NEP). labels Nov 1, 2023

walnut-the-cat mentioned this pull request Nov 29, 2023

[ProjectTracking]: Stateless validation MVP near/near-one-project-tracking#5

Closed

23 tasks

walnut-the-cat mentioned this pull request Jan 10, 2024

[ProjectTracking]: Stateless validation StatelessNet near/near-one-project-tracking#20

Open

64 tasks

robin-near and others added 3 commits January 26, 2024 07:55

Revise the summary and motivation, add high level flow. (#527)

5627ced

Merge branch 'master' into state-validation

d036e84

Update nep-0509.md

c059526

fix lint errors

walnut-the-cat self-assigned this Apr 12, 2024

walnut-the-cat commented Jun 11, 2024

View reviewed changes

wacban and others added 6 commits June 12, 2024 16:54

test test

6c7c1d9

doc: validator structure change, state witness reference impl (#546)

a999b1d

Documenting changes to validators and describing basics of reference implementation.

linter linebreaks

28b5fb2

add security implications

dd9ba09

Add information about state witness size limits to NEP 509 (#547)

ca02c29

Update documentation about combined_transactions_size_limit (#549)

7634275

In near/nearcore#11582 we're increasing `combined_transactions_size_limit`, so let's update the NEP to match the implementation.

future possibilities and consequences (#548)

6069805

birchmd approved these changes Jun 17, 2024

View reviewed changes

neps/nep-0509.md Outdated Show resolved Hide resolved

neps/nep-0509.md Outdated Show resolved Hide resolved

neps/nep-0509.md Outdated Show resolved Hide resolved

Shreyan Gupta and others added 2 commits June 18, 2024 14:14

Add section about partial witness distribution (#550)

c211773

Update nep-0509.md

1d1e74b

fix lint warning and apply Michael's suggestion

wacban mentioned this pull request Jul 2, 2024

stabilize congestion control and stateless validation near/nearcore#11701

Merged

2 tasks

mfornet reviewed Jul 5, 2024

View reviewed changes

walnut-the-cat and others added 8 commits July 8, 2024 08:13

Update neps/nep-0509.md

0a6ced4

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Update neps/nep-0509.md

6bfb3c6

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Update neps/nep-0509.md

faccd3f

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Update neps/nep-0509.md

880be34

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Update neps/nep-0509.md

3969814

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Update neps/nep-0509.md

46ada3f

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

Update neps/nep-0509.md

684b962

Co-authored-by: Marcelo Fornet <mfornet94@gmail.com>

feedback

da99398

walnut-the-cat added 2 commits July 19, 2024 10:28

Merge branch 'master' into state-validation

714c830

Update nep-0509.md

b711506

lint error

walnut-the-cat marked this pull request as ready for review July 19, 2024 17:31

walnut-the-cat requested a review from a team as a code owner July 19, 2024 17:31

Update nep-0509.md

425f465

lint error

flmel added S-approved A NEP that was approved by a working group. and removed S-review/needs-sme-review A NEP in the REVIEW stage is waiting for Subject Matter Expert review. labels Jul 22, 2024

flmel approved these changes Jul 26, 2024

View reviewed changes

flmel merged commit 49b56d1 into master Jul 26, 2024
4 checks passed

flmel deleted the state-validation branch July 26, 2024 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEP-509: Stateless validation stage 0 #509

NEP-509: Stateless validation stage 0 #509

walnut-the-cat commented Sep 19, 2023

render bot commented Sep 19, 2023

frol commented Nov 1, 2023

bowenwang1996 commented Jun 17, 2024

birchmd left a comment

birchmd commented Jun 26, 2024

victorchimakanu commented Jun 27, 2024 •

edited

Loading

pugachAG commented Jul 2, 2024

Longarithm commented Jul 2, 2024 •

edited

Loading

shreyan-gupta commented Jul 4, 2024

mfornet left a comment •

edited

Loading

mfornet Jul 5, 2024

mfornet Jul 5, 2024

Longarithm Jul 8, 2024

Longarithm commented Jul 8, 2024

flmel commented Jul 26, 2024 •

edited

Loading

NEP-509: Stateless validation stage 0 #509

NEP-509: Stateless validation stage 0 #509

Conversation

walnut-the-cat commented Sep 19, 2023

render bot commented Sep 19, 2023

frol commented Nov 1, 2023

bowenwang1996 commented Jun 17, 2024

birchmd left a comment

Choose a reason for hiding this comment

birchmd commented Jun 26, 2024

victorchimakanu commented Jun 27, 2024 • edited Loading

NEP Status (Updated by NEP Moderators)

pugachAG commented Jul 2, 2024

max witness size

avg witness size

Longarithm commented Jul 2, 2024 • edited Loading

shreyan-gupta commented Jul 4, 2024

mfornet left a comment • edited Loading

Choose a reason for hiding this comment

mfornet Jul 5, 2024

Choose a reason for hiding this comment

mfornet Jul 5, 2024

Choose a reason for hiding this comment

Longarithm Jul 8, 2024

Choose a reason for hiding this comment

Longarithm commented Jul 8, 2024

flmel commented Jul 26, 2024 • edited Loading

victorchimakanu commented Jun 27, 2024 •

edited

Loading

Longarithm commented Jul 2, 2024 •

edited

Loading

mfornet left a comment •

edited

Loading

flmel commented Jul 26, 2024 •

edited

Loading