Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus Observer] Add block payload verification. #13930

Closed
wants to merge 6 commits into from

Conversation

JoshLind
Copy link
Contributor

@JoshLind JoshLind commented Jul 8, 2024

Note:

Description

This PR makes several improvements to consensus observer. Specifically, it offers the following commits:

  1. Add a reset() method to the execution client trait. This method is already provided internally by sync_to and we want to expose it so that we can call it directly whenever consensus observer needs to reset execution state (e.g., on subscription changes).
  2. Add methods to the missing and pending block stores to clear all data. This will also be used to reset internal state (e.g., on subscription changes).
  3. Update consensus observer to perform message verification for block payloads. Specifically, we: (i) verify the payload batch digests (i.e., to make sure all transactions are valid and in the correct order); (ii) verify the payload signatures over the batches (if the payload is for the current epoch); (iii) if the payload signatures can't be verified in the current epoch, we store them and verify them once we transition epochs; (iv) when the payloads are retrieved (for an ordered block), we make sure the batches and transactions match the expected values in the block (this ties the payload to a verified block); and (v) add a configurable maximum number of pending payloads (to avoid OOM attacks).

Testing Plan

New and existing test infrastructure.

Copy link

trunk-io bot commented Jul 8, 2024

⏱️ 12h 46m total CI duration on this PR
Job Cumulative Duration Recent Runs
test-fuzzers 6h 45m 🟩🟩🟩🟩🟩 (+7 more)
rust-images / rust-all 1h 54m 🟩🟩 (+6 more)
forge-e2e-test / forge 1h 39m 🟩🟩🟩🟥🟥 (+2 more)
test-target-determinator 49m 🟩🟩🟩 (+6 more)
general-lints 20m 🟩🟩🟩🟩🟩 (+6 more)
check-dynamic-deps 12m 🟩🟩🟩🟩🟩 (+7 more)
rust-cargo-deny 11m 🟩🟩🟩🟩🟩 (+2 more)
rust-move-tests 7m 🟩
rust-move-tests 6m 🟩
rust-move-tests 6m 🟩
rust-move-tests 5m
semgrep/ci 5m 🟩🟩🟩🟩🟩 (+7 more)
rust-move-tests 3m 🟩
rust-move-tests 3m 🟩
rust-move-tests 3m 🟩
rust-move-tests 3m 🟥
rust-move-tests 3m 🟥
rust-move-tests 2m
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+6 more)
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+6 more)
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+7 more)
permission-check 45s 🟩🟩🟩🟩🟩 (+7 more)
permission-check 40s 🟩🟩🟩🟩🟩 (+6 more)
permission-check 40s 🟩🟩🟩🟩🟩 (+7 more)
permission-check 34s 🟩🟩🟩🟩🟩 (+7 more)
permission-check 33s 🟩🟩🟩🟩🟩 (+7 more)
determine-docker-build-metadata 26s 🟩🟩🟩🟩🟩 (+6 more)
rust-move-tests 22s
rust-move-tests 1s

settingsfeedbackdocs ⋅ learn more about trunk.io

@JoshLind JoshLind added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Jul 8, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@JoshLind JoshLind marked this pull request as ready for review July 8, 2024 21:39
@JoshLind JoshLind requested review from brianolson and bchocho July 8, 2024 21:39

This comment has been minimized.

This comment has been minimized.

@JoshLind JoshLind force-pushed the co_peer_ref_4 branch 2 times, most recently from 9374b6c to 56ccca1 Compare July 11, 2024 23:35
@JoshLind JoshLind force-pushed the co_peer_ref_6 branch 2 times, most recently from 8e833c4 to 450357d Compare July 12, 2024 19:56

This comment has been minimized.

This comment has been minimized.

@JoshLind JoshLind force-pushed the co_peer_ref_4 branch 2 times, most recently from 3086374 to 71ecc09 Compare July 15, 2024 21:02

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

❌ Forge suite realistic_env_max_load failure on 56ccd175cfee52aaec0603b9c74f197c34dfc69e

two traffics test: inner traffic : committed: 9229.536690177732 txn/s, latency: 4339.606890911474 ms, (p50: 4200 ms, p90: 5100 ms, p99: 19900 ms), latency samples: 3509260
two traffics test : committed: 100.10194692846773 txn/s, latency: 3917.576404494382 ms, (p50: 1400 ms, p90: 16200 ms, p99: 25800 ms), latency samples: 1780
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.273, avg: 0.232", "QsPosToProposal: max: 2.520, avg: 1.977", "ConsensusProposalToOrdered: max: 0.326, avg: 0.295", "ConsensusOrderedToCommit: max: 0.420, avg: 0.406", "ConsensusProposalToCommit: max: 0.712, avg: 0.700"]
Test Failed: check for success

Caused by:
    Failed latency check, for ["P90 latency is 16.2s and exceeds limit of 4.5s"]

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_latency
             at ./testsuite/forge/src/success_criteria.rs:574:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_for_success::{{closure}}
             at ./testsuite/forge/src/success_criteria.rs:298:9
   3: aptos_forge::interface::network::NetworkContext::check_for_success::{{closure}}
             at ./testsuite/forge/src/interface/network.rs:112:10
   4: <dyn aptos_testcases::NetworkLoadTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:321:14
   5: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   6: <aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:76:47
   7: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   8: <aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:617:37
   9: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  10: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
  11: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  12: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  13: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
  14: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
  15: tokio::runtime::handle::Handle::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/handle.rs:310:22
  16: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  17: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/handle.rs:309:9
  18: aptos_forge::runner::Forge<F>::run::{{closure}}
             at ./testsuite/forge/src/runner.rs:611:49
  19: aptos_forge::runner::run_test
             at ./testsuite/forge/src/runner.rs:684:11
  20: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:611:30
  21: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:429:11
  22: forge::main
             at ./testsuite/forge-cli/src/main.rs:355:21
  23: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  24: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  25: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  26: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  27: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  28: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  29: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  30: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  31: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  32: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  33: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  34: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  35: main
  36: __libc_start_main
  37: _start
Trailing Log Lines:
  32: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  33: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  34: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  35: main
  36: __libc_start_main
  37: _start


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-e2e-pr-13930-1721166009-56ccd175cfee52aaec0603b9c74f197c3","timestamp":"2024-07-16T21:52:10.957259Z","message":"Deleting namespace forge-e2e-pr-13930: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-e2e-pr-13930-1721166009-56ccd175cfee52aaec0603b9c74f197c3","timestamp":"2024-07-16T21:52:10.957277Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-13930"}
Failed to run tests:
Tests Failed

failures:
    CompositeNetworkTest

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:636:13
   2: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:429:11
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:355:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  15: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  16: main
  17: __libc_start_main
  18: _start
Debugging output:
NAME                                    READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge188-0       1/1     Running     0          11m
aptos-node-0-validator-0                1/1     Running     0          11m
aptos-node-1-fullnode-eforge188-0       1/1     Running     0          11m
aptos-node-1-validator-0                1/1     Running     0          11m
aptos-node-2-fullnode-eforge188-0       1/1     Running     0          11m
aptos-node-2-validator-0                1/1     Running     0          11m
aptos-node-3-fullnode-eforge188-0       1/1     Running     0          11m
aptos-node-3-validator-0                1/1     Running     0          11m
aptos-node-4-fullnode-eforge188-0       1/1     Running     0          11m
aptos-node-4-validator-0                1/1     Running     0          11m
aptos-node-5-validator-0                1/1     Running     0          11m
aptos-node-6-validator-0                1/1     Running     0          11m
genesis-aptos-genesis-eforge188-wggl5   0/1     Completed   0          11m

@JoshLind JoshLind marked this pull request as draft July 17, 2024 14:34
Base automatically changed from co_peer_ref_4 to main July 17, 2024 20:04
@JoshLind JoshLind closed this Jul 17, 2024
@zekun000
Copy link
Contributor

is this closed for a new pr?

@JoshLind JoshLind deleted the co_peer_ref_6 branch July 23, 2024 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants