Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Commit

Permalink
Merge branch 'master' into ao-glossary-update
Browse files Browse the repository at this point in the history
* master:
  ig: Fix description of execution retry delay (#6342)
  Added Amforc bootnodes for Polkadot and Kusama (#6077)
  [ci] fix build-implementers-guide (#6335)
  Rate limit improvements (#6315)
  Add PVF module documentation (#6293)
  Update async-trait version to v0.1.58 (#6319)
  • Loading branch information
ordian committed Nov 25, 2022
2 parents d1edc0b + 795b20c commit 5aa3fca
Show file tree
Hide file tree
Showing 19 changed files with 300 additions and 245 deletions.
365 changes: 183 additions & 182 deletions Cargo.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion node/core/pvf/src/executor_intf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ pub fn prevalidate(code: &[u8]) -> Result<RuntimeBlob, sc_executor_common::error
}

/// Runs preparation on the given runtime blob. If successful, it returns a serialized compiled
/// artifact which can then be used to pass into [`execute`] after writing it to the disk.
/// artifact which can then be used to pass into `Executor::execute` after writing it to the disk.
pub fn prepare(blob: RuntimeBlob) -> Result<Vec<u8>, sc_executor_common::error::WasmError> {
sc_executor_wasmtime::prepare_runtime_artifact(blob, &CONFIG.semantics)
}
Expand Down
37 changes: 26 additions & 11 deletions node/core/pvf/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,27 @@

#![warn(missing_docs)]

//! A crate that implements PVF validation host.
//! A crate that implements the PVF validation host.
//!
//! For more background, refer to the Implementer's Guide: [PVF
//! Pre-checking](https://paritytech.github.io/polkadot/book/pvf-prechecking.html) and [Candidate
//! Validation](https://paritytech.github.io/polkadot/book/node/utility/candidate-validation.html#pvf-host).
//!
//! # Entrypoint
//!
//! This crate provides a simple API. You first [`start`] the validation host, which gives you the
//! [handle][`ValidationHost`] and the future you need to poll.
//!
//! Then using the handle the client can send two types of requests:
//! Then using the handle the client can send three types of requests:
//!
//! (a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it (verify and
//! compile) in order to pre-check its validity.
//!
//! (a) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`]
//! (b) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`]
//! and the PVF [code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF
//! with the `params`.
//!
//! (b) Heads up. This request allows to signal that the given PVF may be needed soon and that it
//! (c) Heads up. This request allows to signal that the given PVF may be needed soon and that it
//! should be prepared for execution.
//!
//! The preparation results are cached for some time after they either used or was signaled in heads up.
Expand All @@ -39,7 +48,7 @@
//! PVF execution requests can specify the [priority][`Priority`] with which the given request should
//! be handled. Different priority levels have different effects. This is discussed below.
//!
//! Preparation started by a heads up signal always starts in with the background priority. If there
//! Preparation started by a heads up signal always starts with the background priority. If there
//! is already a request for that PVF preparation under way the priority is inherited. If after heads
//! up, a new PVF execution request comes in with a higher priority, then the original task's priority
//! will be adjusted to match the new one if it's larger.
Expand All @@ -48,18 +57,22 @@
//!
//! # Under the hood
//!
//! ## The flow
//!
//! Under the hood, the validation host is built using a bunch of communicating processes, not
//! dissimilar to actors. Each of such "processes" is a future task that contains an event loop that
//! processes incoming messages, potentially delegating sub-tasks to other "processes".
//!
//! Two of these processes are queues. The first one is for preparation jobs and the second one is for
//! execution. Both of the queues are backed by separate pools of workers of different kind.
//!
//! Preparation workers handle preparation requests by preverifying and instrumenting PVF wasm code,
//! Preparation workers handle preparation requests by prevalidating and instrumenting PVF wasm code,
//! and then passing it into the compiler, to prepare the artifact.
//!
//! Artifact is a final product of preparation. If the preparation succeeded, then the artifact will
//! contain the compiled code usable for quick execution by a worker later on.
//! ## Artifacts
//!
//! An artifact is the final product of preparation. If the preparation succeeded, then the artifact
//! will contain the compiled code usable for quick execution by a worker later on.
//!
//! If the preparation failed, then the worker will still write the artifact with the error message.
//! We save the artifact with the error so that we don't try to prepare the artifacts that are broken
Expand All @@ -68,12 +81,14 @@
//! The artifact is saved on disk and is also tracked by an in memory table. This in memory table
//! doesn't contain the artifact contents though, only a flag that the given artifact is compiled.
//!
//! A pruning task will run at a fixed interval of time. This task will remove all artifacts that
//! weren't used or received a heads up signal for a while.
//!
//! ## Execution
//!
//! The execute workers will be fed by the requests from the execution queue, which is basically a
//! combination of a path to the compiled artifact and the
//! [`params`][`polkadot_parachain::primitives::ValidationParams`].
//!
//! Each fixed interval of time a pruning task will run. This task will remove all artifacts that
//! weren't used or received a heads up signal for a while.

mod artifacts;
mod error;
Expand Down
2 changes: 1 addition & 1 deletion node/core/pvf/src/priority.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ pub enum Priority {
Normal,
/// This priority is used for requests that are required to be processed as soon as possible.
///
/// For example, backing is on critical path and require execution as soon as possible.
/// For example, backing is on a critical path and requires execution as soon as possible.
Critical,
}

Expand Down
6 changes: 6 additions & 0 deletions node/network/dispute-distribution/src/receiver/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,12 @@ where

// Queue request:
if let Err((authority_id, req)) = self.peer_queues.push_req(authority_id, req) {
gum::debug!(
target: LOG_TARGET,
?authority_id,
?peer,
"Peer hit the rate limit - dropping message."
);
req.send_outgoing_response(OutgoingResponse {
result: Err(()),
reputation_changes: vec![COST_APPARENT_FLOOD],
Expand Down
18 changes: 12 additions & 6 deletions node/network/dispute-distribution/src/sender/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,6 @@ impl DisputeSender {
runtime: &mut RuntimeInfo,
msg: DisputeMessage,
) -> Result<()> {
self.rate_limit.limit().await;

let req: DisputeRequest = msg.into();
let candidate_hash = req.0.candidate_receipt.hash();
match self.disputes.entry(candidate_hash) {
Expand All @@ -118,6 +116,8 @@ impl DisputeSender {
return Ok(())
},
Entry::Vacant(vacant) => {
self.rate_limit.limit("in start_sender", candidate_hash).await;

let send_task = SendTask::new(
ctx,
runtime,
Expand Down Expand Up @@ -169,10 +169,12 @@ impl DisputeSender {

// Iterates in order of insertion:
let mut should_rate_limit = true;
for dispute in self.disputes.values_mut() {
for (candidate_hash, dispute) in self.disputes.iter_mut() {
if have_new_sessions || dispute.has_failed_sends() {
if should_rate_limit {
self.rate_limit.limit().await;
self.rate_limit
.limit("while going through new sessions/failed sends", *candidate_hash)
.await;
}
let sends_happened = dispute
.refresh_sends(ctx, runtime, &self.active_sessions, &self.metrics)
Expand All @@ -193,7 +195,7 @@ impl DisputeSender {
// recovered at startup will be relatively "old" anyway and we assume that no more than a
// third of the validators will go offline at any point in time anyway.
for dispute in unknown_disputes {
self.rate_limit.limit().await;
self.rate_limit.limit("while going through unknown disputes", dispute.1).await;
self.start_send_for_dispute(ctx, runtime, dispute).await?;
}
Ok(())
Expand Down Expand Up @@ -383,14 +385,18 @@ impl RateLimit {
}

/// Wait until ready and prepare for next call.
async fn limit(&mut self) {
///
/// String given as occasion and candidate hash are logged in case the rate limit hit.
async fn limit(&mut self, occasion: &'static str, candidate_hash: CandidateHash) {
// Wait for rate limit and add some logging:
poll_fn(|cx| {
let old_limit = Pin::new(&mut self.limit);
match old_limit.poll(cx) {
Poll::Pending => {
gum::debug!(
target: LOG_TARGET,
?occasion,
?candidate_hash,
"Sending rate limit hit, slowing down requests"
);
Poll::Pending
Expand Down
4 changes: 3 additions & 1 deletion node/service/chain-specs/kusama.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@
"/dns/boot.stake.plus/tcp/31333/p2p/12D3KooWLa1UyG5xLPds2GbiRBCTJjpsVwRWHWN7Dff14yiNJRpR",
"/dns/boot.stake.plus/tcp/31334/wss/p2p/12D3KooWLa1UyG5xLPds2GbiRBCTJjpsVwRWHWN7Dff14yiNJRpR",
"/dns/boot-node.helikon.io/tcp/7060/p2p/12D3KooWL4KPqfAsPE2aY1g5Zo1CxsDwcdJ7mmAghK7cg6M2fdbD",
"/dns/boot-node.helikon.io/tcp/7062/wss/p2p/12D3KooWL4KPqfAsPE2aY1g5Zo1CxsDwcdJ7mmAghK7cg6M2fdbD"
"/dns/boot-node.helikon.io/tcp/7062/wss/p2p/12D3KooWL4KPqfAsPE2aY1g5Zo1CxsDwcdJ7mmAghK7cg6M2fdbD",
"/dns/kusama.bootnode.amforc.com/tcp/30333/p2p/12D3KooWLx6nsj6Fpd8biP1VDyuCUjazvRiGWyBam8PsqRJkbUb9",
"/dns/kusama.bootnode.amforc.com/tcp/30334/wss/p2p/12D3KooWLx6nsj6Fpd8biP1VDyuCUjazvRiGWyBam8PsqRJkbUb9"
],
"telemetryEndpoints": [
[
Expand Down
4 changes: 3 additions & 1 deletion node/service/chain-specs/polkadot.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@
"/dns/boot.stake.plus/tcp/30333/p2p/12D3KooWKT4ZHNxXH4icMjdrv7EwWBkfbz5duxE5sdJKKeWFYi5n",
"/dns/boot.stake.plus/tcp/30334/wss/p2p/12D3KooWKT4ZHNxXH4icMjdrv7EwWBkfbz5duxE5sdJKKeWFYi5n",
"/dns/boot-node.helikon.io/tcp/7070/p2p/12D3KooWS9ZcvRxyzrSf6p63QfTCWs12nLoNKhGux865crgxVA4H",
"/dns/boot-node.helikon.io/tcp/7072/wss/p2p/12D3KooWS9ZcvRxyzrSf6p63QfTCWs12nLoNKhGux865crgxVA4H"
"/dns/boot-node.helikon.io/tcp/7072/wss/p2p/12D3KooWS9ZcvRxyzrSf6p63QfTCWs12nLoNKhGux865crgxVA4H",
"/dns/polkadot.bootnode.amforc.com/tcp/30333/p2p/12D3KooWAsuCEVCzUVUrtib8W82Yne3jgVGhQZN3hizko5FTnDg3",
"/dns/polkadot.bootnode.amforc.com/tcp/30334/wss/p2p/12D3KooWAsuCEVCzUVUrtib8W82Yne3jgVGhQZN3hizko5FTnDg3"
],
"telemetryEndpoints": [
[
Expand Down
5 changes: 5 additions & 0 deletions roadmap/implementers-guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ Then install and build the book:
```sh
cargo install mdbook mdbook-linkcheck mdbook-graphviz mdbook-mermaid mdbook-last-changed
mdbook serve roadmap/implementers-guide
```

and in a second terminal window run:

```sh
open http://localhost:3000
```

Expand Down
1 change: 0 additions & 1 deletion roadmap/implementers-guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@
- [Availability](types/availability.md)
- [Overseer and Subsystem Protocol](types/overseer-protocol.md)
- [Runtime](types/runtime.md)
- [Chain](types/chain.md)
- [Messages](types/messages.md)
- [Network](types/network.md)
- [Approvals](types/approval.md)
Expand Down
1 change: 0 additions & 1 deletion roadmap/implementers-guide/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,3 @@ exactly one downward message queue.
Also of use is the [Substrate Glossary](https://substrate.dev/docs/en/knowledgebase/getting-started/glossary).

[0]: https://wiki.polkadot.network/docs/learn-consensus
[1]: #pvf
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,39 @@ Once we have all parameters, we can spin up a background task to perform the val

If we can assume the presence of the relay-chain state (that is, during processing [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`) we can run all the checks that the relay-chain would run at the inclusion time thus confirming that the candidate will be accepted.

### PVF Host

The PVF host is responsible for handling requests to prepare and execute PVF
code blobs.

One high-level goal is to make PVF operations as deterministic as possible, to
reduce the rate of disputes. Disputes can happen due to e.g. a job timing out on
one machine, but not another. While we do not yet have full determinism, there
are some dispute reduction mechanisms in place right now.

#### Retrying execution requests

If the execution request fails during **preparation**, we will retry if it is
possible that the preparation error was transient (e.g. if the error was a panic
or time out). We will only retry preparation if another request comes in after
15 minutes, to ensure any potential transient conditions had time to be
resolved. We will retry up to 5 times.

If the actual **execution** of the artifact fails, we will retry once if it was
an ambiguous error after a brief delay, to allow any potential transient
conditions to clear.

#### Preparation timeouts

We use timeouts for both preparation and execution jobs to limit the amount of
time they can take. As the time for a job can vary depending on the machine and
load on the machine, this can potentially lead to disputes where some validators
successfuly execute a PVF and others don't.

One mitigation we have in place is a more lenient timeout for preparation during
execution than during pre-checking. The rationale is that the PVF has already
passed pre-checking, so we know it should be valid, and we allow it to take
longer than expected, as this is likely due to an issue with the machine and not
the PVF.

[CVM]: ../../types/overseer-protocol.md#validationrequesttype
6 changes: 3 additions & 3 deletions roadmap/implementers-guide/src/node/utility/pvf-prechecker.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ This subsytem does not produce any output messages either. The subsystem will, h

If the node is running in a collator mode, this subsystem will be disabled. The PVF pre-checker subsystem keeps track of the PVFs that are relevant for the subsystem.

To be relevant for the subsystem, a PVF must be returned by `pvfs_require_precheck` [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant.
To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant.

When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for the pre-check.

Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements.
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements.

Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements for for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index that was previously observed in any of the leaves.

Expand All @@ -28,4 +28,4 @@ If the node is not in the active validator set, it will still perform all the ch
[Runtime API]: runtime-api.md
[PVF pre-checking runtime API]: ../../runtime-api/pvf-prechecking.md
[Candidate Validation]: candidate-validation.md
[`PvfCheckStatement`]: ../../types/pvf-prechecking.md
[PvfCheckStatement]: ../../types/pvf-prechecking.md#pvfcheckstatement
16 changes: 16 additions & 0 deletions roadmap/implementers-guide/src/types/candidate.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,22 @@ struct CandidateDescriptor {
}
```

## `ValidationParams`

```rust
/// Validation parameters for evaluating the parachain validity function.
pub struct ValidationParams {
/// Previous head-data.
pub parent_head: HeadData,
/// The collation body.
pub block_data: BlockData,
/// The current relay-chain block number.
pub relay_parent_number: RelayChainBlockNumber,
/// The relay-chain block's storage root.
pub relay_parent_storage_root: Hash,
}
```

## `PersistedValidationData`

The validation data provides information about how to create the inputs for validation of a candidate. This information is derived from the chain state and will vary from para to para, although some of the fields may be the same for every para.
Expand Down
30 changes: 0 additions & 30 deletions roadmap/implementers-guide/src/types/chain.md

This file was deleted.

4 changes: 1 addition & 3 deletions roadmap/implementers-guide/src/types/overseer-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -681,9 +681,7 @@ enum ProvisionerMessage {

The Runtime API subsystem is responsible for providing an interface to the state of the chain's runtime.

This is fueled by an auxiliary type encapsulating all request types defined in the Runtime API section of the guide.

> To do: link to the Runtime API section. Not possible currently because of https://github.com/Michael-F-Bryan/mdbook-linkcheck/issues/25. Once v0.7.1 is released it will work.
This is fueled by an auxiliary type encapsulating all request types defined in the [Runtime API section](../runtime-api) of the guide.

```rust
enum RuntimeApiRequest {
Expand Down
2 changes: 2 additions & 0 deletions roadmap/implementers-guide/src/types/pvf-prechecking.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# PVF Pre-checking types

## `PvfCheckStatement`

> ⚠️ This type was added in v2.
One of the main units of information on which PVF pre-checking voting is build is the `PvfCheckStatement`.
Expand Down
1 change: 1 addition & 0 deletions scripts/ci/gitlab/lingua.dic
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@ preconfigured
preimage/MS
preopen
prepend/G
prevalidating
prevalidation
preverify/G
programmatically
Expand Down
6 changes: 2 additions & 4 deletions scripts/ci/gitlab/pipeline/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -171,14 +171,12 @@ build-implementers-guide:
# git depth is set on purpose: https://github.com/paritytech/polkadot/issues/6284
variables:
GIT_DEPTH: 0
CI_IMAGE: paritytech/mdbook-utils:e14aae4a-20221123
script:
- apt-get -y update; apt-get install -y graphviz
- cargo install mdbook mdbook-mermaid mdbook-linkcheck mdbook-graphviz mdbook-last-changed
- mdbook build ./roadmap/implementers-guide
- mkdir -p artifacts
- mv roadmap/implementers-guide/book artifacts/
# FIXME: remove me after CI image gets nonroot
- chown -R nonroot:nonroot artifacts/
- ls -la artifacts/

build-short-benchmark:
stage: build
Expand Down

0 comments on commit 5aa3fca

Please sign in to comment.