Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create complete and exhaustive list of sources of indeterminism in PVF #653

Open
eskimor opened this issue Apr 17, 2023 · 5 comments
Open

Comments

@eskimor
Copy link
Member

eskimor commented Apr 17, 2023

In order to systematically address all sources of indeterminism we have in PVF execution (and preparation), we should start with a list.

I would like to have in the guide a list of all possible sources we can think of, with sections following that list explaining each one in detail together with implemented or possible mitigations.

@eskimor
Copy link
Member Author

eskimor commented Apr 17, 2023

For the stack limit, I just had the following idea: Extend the upcoming time dispute mechanism:

Approval checkers could not only report time, but also maximum stack depth.

Assumption

We are able to assume some upper bound for the stack depth fluctuations across supported architectures/implementations/etc.

Idea

Let's assume above upper bound for fluctuation is 2, then we can have the backers commit to some bound X. Approval checkers are allowed a much larger limit like 6*X. Now in the honest case, approval checkers will never exceed the limit - hence no indeterminism.

For the dishonest case, we do the same as in time disputes: We start charging the backers once the approval checkers say that the stack limit was larger than 2*X, because we can then be sure, even with implementation differences that the backers
are faulty and are the ones to punish. If backers push it further they could still trigger a dispute, but given the data we would not punish those dispute raising validators, but likely slash backers a lot instead.

Possible Extensions

We might be able to extend the "time dispute" mechanism to address all not otherwise solvable indeterminism sources.

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ux-implications-of-pvf-executor-environment-versioning/2519/25

@mrcnski
Copy link
Contributor

mrcnski commented Apr 17, 2023

Here's a couple to start the list:

Indeterminism source Explanation Mitigations Status
Differences in preparation/execution time on different machines Can lead to jobs timing out on some machines and not others Mitigated somewhat by counting CPU time instead of wall clock time, the former being more independent of system load. implemented
Differences in available memory on different machines Can lead to jobs hitting OOM on some machines and not others. Some mitigations are being researched, such as #745 and #767. future

@tomaka
Copy link
Contributor

tomaka commented Apr 18, 2023

Not exactly indeterminism, but closely related:

  • The NaN representation of floating points. Wasmtime has an option for that that, but other implementations might use a different representation.

  • The allocator algorithm. At the moment, every implementation has to copy the exact behavior of the Substrate allocator down to the smallest detail. I would personally in generally be strongly in favor of removing this allocator altogether (it's in general a poor design for several reasons) and moving the memory allocation to the runtime, but that needs a refactor of many host functions.

@mrcnski
Copy link
Contributor

mrcnski commented Aug 8, 2023

I found an old issue listing sources of indeterminism: #990. I haven't gone through in depth, but I see some have already been mentioned here.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
claravanstaden added a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
* Attempts message proof.

* Adds basic inbound and outbound channel config

* Ropsten start services script

* Replace localhost with ropsten in configure contracts

* changed urls

* added beefy light client sink url

* inject private keys from variables

* Do config file replacements in one go

* Ropsten script changes.

* moved variables out of envrc template

* removed more variables from example envrc

* remove ropsten script

* get from environment or use default

* insert test account into accounts list

* fixed imports

* consistent naming

* get infura key

* add test accounts

* fixed beefy issue. dont re-use accounts

* Testing changes

* Transfer less eth

* Transfer less ETH

* Temp testing changes

* Tweaks message verification code.

* Adds geth, lodestar and beacon relay

* Adds some more logs and fix config.

* Comments out beacon relayer temporarily

* Message verification changes.

* Message verification works.

* Cleanup of message verification.

* Message verification relayer cleanup.

* Cleanup and updating start-services.

* Reverts Bootstrap test.

* Adds lodestar to readme.

* Removes unnecessary log

* PR comments.

* Removes binary.

Co-authored-by: claravanstaden <Cats 4 life!>
Co-authored-by: Alistair Singh <alistair.singh7@gmail.com>
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
Signed-off-by: koushiro <koushiro.cqx@gmail.com>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
bkchr pushed a commit that referenced this issue Apr 10, 2024
* Add test proving bug

* Add checks for duplicate headers

* Fix Clippy error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

4 participants