Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Executor] CapturedReads component, Metadata and exists support #10170

Merged
merged 2 commits into from
Sep 26, 2023

Conversation

gelash
Copy link
Contributor

@gelash gelash commented Sep 21, 2023

Create a component that exclusively deals with capturing reads from transaction execution, and validating them

  • Incorporate new functionality to distinguish between reading a full value, just metadata, or only checking for existence. For now, the callers don't take full benefit of it, but resource groups will (for metadata, and it will have its own existence check for resources in the group too - functionality also implemented in CapturedReads).
  • Test the component extensively.

Besides being important for getting the performance improvements for resource groups (o.w. we would get R/W conflicts on metadata), this greatly simplifies validation and code extensibility for the future, also gives us infrastructure to avoid many R/W conflicts in the future.

CapturedReads has comprehensive unit test suite. Metadata and exists reading will be integrated into proptests in a follow up PR.

Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks quite clean - it should be very easy to integrate with aggregators as well.

some questions inline:

aptos-move/block-executor/src/captured_reads.rs Outdated Show resolved Hide resolved
DataReadComparison::Insufficient => {
// New read is of a lower kind than existing read, but while
// reading it must have been derived from the existing read.
debug_assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this debug assert? (and not invariant_violation)

Copy link
Contributor Author

@gelash gelash Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATED: We aren't inside the Move-VM so we can't directly throw INVARIANT_VIOLATION. Since we are in the read, we could return a fake STORAGE_ERROR to the transaction and cause it to abort, but then the question is whether we need to replace the error somehow to more proper INVARIANT_VIOLATION before it reaches the user. @zekun000, @runtian-zhou - you would know best? Maybe Storage Error with appropriate message is sufficient?

If the caller first checks to derive from an existing read (which we do, and makes sense to do), then no inconsistency should be possible (lower contained kind yes). However, I think now that asserting is maybe not good, since if it isn't the intended way now or in the future (even because of the bug), surfacing it to trigger the assert would be highly unpredictable / dependent on luck/fate.

One other option is to log an error and fall back to sequential execution. This isn't hard to do and would be useful for other cases we had in mind.
@igor-aptos @zekun000 what do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I find reasonable to assert invariants that are localized and guaranteed to be true. (i.e. where looking at the local code we can be convinced it will never be triggered).

But here it is not really local - as it goes all the way to Unsync and Sync implementations, etc.

For such - should be - but hard to guarantee, I would be throwing special CODE_INVARIANT_BROKEN issue, that can then trigger an alert, as well as fall back to sequential execution. I am trying to see how to do that right.

But here, even more importantly - is this even an invariant? Could it be perfectly possible that read of one kind returns to the call, then another transaction before it completes and writes output, and then read for a different kind is issued, which doesn't match up here? So bail is more appropriate (does bail here require transaction re-execution)?

One stupid question, if transaction issues two reads, is there any caching, to make sure transaction sees the same thing, or will we end up in the Inconsistent case above?

Copy link
Contributor Author

@gelash gelash Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transaction session has its own cache, that one is fully local to it, but having multiple sessions have caused weird invariant behaviors in the past (but mainly speculative violations that re-execution was fixing).

the intended use (and should be the use after this change), is that every read will first come to CapturedReads and serve from here if possible (i.e. captured read exists with >= kind), if not do the full read logic and also capture it. When served from CapturedReads it must not re-capture, so should never capture with lower kinds.
It seems like then there is one legit case when it can be inconsistent due to normal speculation - as you mentioned, when strictly lower kind is captured, and the higher kind is not consistent with it.

I like your methodology a lot, let's see how to apply here. There seem to be two distinct behaviors:

  • given that inconsistency with strictly higher kind read can be speculative, in that case we can mark speculative failure (that will guarantee re-execution) and forward storage error (that will be cleared after re-execution).
  • With inconsistency at the same kind of read, or having a read of a lower kind, let me alert/log an error and fallback to sequential execution. Sounds good?

@gelash gelash force-pushed the gelash/metaandexists branch 2 times, most recently from 32dce24 to f877892 Compare September 22, 2023 13:04
@@ -22,31 +22,31 @@ pub trait TResourceView {
/// - Err(...) otherwise (e.g. storage error).
fn get_resource_state_value(
&self,
key: &Self::Key,
state_key: &Self::Key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it key per @zekun000's suggestion previously😂 Fine with any personally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and at some places Zekun said key is too ambiguous, pretty sure we don't understand the nuance :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I said Self::Key is ambiguous with Self::Tag or something

pub(crate) fn with_len_and_metadata(len: usize, metadata: StateValueMetadataKind) -> Self {
Self {
bytes: (len > 0).then_some(vec![100_u8; len].into()),
metadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, can we even have metadata when bytes are empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not looking at the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and this is an ugly test type, eventually should consolidate all test types in different tests and crates (mvhashmap, etc)

Copy link
Contributor Author

@gelash gelash Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for real WriteOps, deletions may contain metadata, but a read following a deletion will not observe it.

aptos-move/block-executor/src/captured_reads.rs Outdated Show resolved Hide resolved
aptos-move/block-executor/src/view.rs Outdated Show resolved Hide resolved
aptos-move/block-executor/src/view.rs Outdated Show resolved Hide resolved
aptos-move/block-executor/src/view.rs Outdated Show resolved Hide resolved
aptos-move/block-executor/src/view.rs Outdated Show resolved Hide resolved
@gelash gelash added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Sep 26, 2023
@gelash gelash enabled auto-merge (squash) September 26, 2023 16:54
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 60dd90f369c4d5d2080b987f60920b14280fafc7

two traffics test: inner traffic : committed: 6783 txn/s, latency: 5791 ms, (p50: 5700 ms, p90: 7100 ms, p99: 13800 ms), latency samples: 2923880
two traffics test : committed: 100 txn/s, latency: 2405 ms, (p50: 2300 ms, p90: 2800 ms, p99: 4500 ms), latency samples: 1820
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.226, avg: 0.213", "QsPosToProposal: max: 0.315, avg: 0.167", "ConsensusProposalToOrdered: max: 0.645, avg: 0.588", "ConsensusOrderedToCommit: max: 0.461, avg: 0.441", "ConsensusProposalToCommit: max: 1.105, avg: 1.030"]
Max round gap was 1 [limit 4] at version 1544809. Max no progress secs was 3.770627 [limit 10] at version 1544809.
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.6.2 ==> 60dd90f369c4d5d2080b987f60920b14280fafc7

Compatibility test results for aptos-node-v1.6.2 ==> 60dd90f369c4d5d2080b987f60920b14280fafc7 (PR)
1. Check liveness of validators at old version: aptos-node-v1.6.2
compatibility::simple-validator-upgrade::liveness-check : committed: 4600 txn/s, latency: 7095 ms, (p50: 6600 ms, p90: 10700 ms, p99: 14000 ms), latency samples: 179420
2. Upgrading first Validator to new version: 60dd90f369c4d5d2080b987f60920b14280fafc7
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1815 txn/s, latency: 16177 ms, (p50: 18900 ms, p90: 22200 ms, p99: 22400 ms), latency samples: 94380
3. Upgrading rest of first batch to new version: 60dd90f369c4d5d2080b987f60920b14280fafc7
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1564 txn/s, latency: 17741 ms, (p50: 18700 ms, p90: 22800 ms, p99: 31900 ms), latency samples: 81340
4. upgrading second batch to new version: 60dd90f369c4d5d2080b987f60920b14280fafc7
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3629 txn/s, latency: 8636 ms, (p50: 9900 ms, p90: 11700 ms, p99: 12200 ms), latency samples: 145160
5. check swarm health
Compatibility test for aptos-node-v1.6.2 ==> 60dd90f369c4d5d2080b987f60920b14280fafc7 passed
Test Ok

@gelash gelash merged commit 5116848 into main Sep 26, 2023
41 of 44 checks passed
@gelash gelash deleted the gelash/metaandexists branch September 26, 2023 19:06
@github-actions
Copy link
Contributor

❌ Forge suite framework_upgrade failure on aptos-node-v1.5.1 ==> 60dd90f369c4d5d2080b987f60920b14280fafc7

Compatibility test results for aptos-node-v1.5.1 ==> 60dd90f369c4d5d2080b987f60920b14280fafc7 (PR)
Upgrade the nodes to version: 60dd90f369c4d5d2080b987f60920b14280fafc7
Test Failed: Unknown error error sending request for url (http://aptos-node-3-validator.forge-framework-upgrade-pr-10170.svc:8080/v1/accounts/0000000000000000000000000000000000000000000000000000000000000001/resource/0x1::features::Features): error trying to connect: tcp connect error: Connection refused (os error 111)

Stack backtrace:
   0: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/result.rs:1961:27
      aptos_release_builder::components::ReleaseEntry::validate_upgrade
             at ./aptos-move/aptos-release-builder/src/components/mod.rs:301:41
   1: aptos_release_builder::components::ReleaseConfig::validate_upgrade
             at ./aptos-move/aptos-release-builder/src/components/mod.rs:517:13
   2: aptos_release_builder::validate::execute_release::{{closure}}
             at ./aptos-move/aptos-release-builder/src/validate.rs:436:13
      aptos_release_builder::validate::validate_config_and_generate_release::{{closure}}
             at ./aptos-move/aptos-release-builder/src/validate.rs:460:6
      aptos_release_builder::validate::validate_config::{{closure}}
             at ./aptos-move/aptos-release-builder/src/validate.rs:446:80
      tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/park.rs:283:63
      tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/coop.rs:107:5
      tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/coop.rs:73:5
      tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/park.rs:283:31
   3: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/context/blocking.rs:66:9
      tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/multi_thread/mod.rs:87:13
      tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/context/runtime.rs:65:16
   4: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
      tokio::runtime::runtime::Runtime::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/runtime.rs:313:50
   5: <aptos_testcases::framework_upgrade::FrameworkUpgrade as aptos_forge::interface::network::NetworkTest>::run
             at ./testsuite/testcases/src/framework_upgrade.rs:97:9
   6: aptos_forge::runner::Forge<F>::run::{{closure}}
             at ./testsuite/forge/src/runner.rs:598:42
      aptos_forge::runner::run_test
             at ./testsuite/forge/src/runner.rs:666:11
      aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:598:30
   7: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:410:11
      forge::main
             at ./testsuite/forge-cli/src/main.rs:336:21
   8: core::ops::function::FnOnce::call_once
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135:18
   9: std::rt::lang_start::{{closure}}
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:166:18
  10: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:284:13
      std::panicking::try::do_call
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
      std::panicking::try
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:48
      std::panicking::try::do_call
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
      std::panicking::try
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:20
  11: main
  12: __libc_start_main
  13: _start
Trailing Log Lines:
      std::panicking::try
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:20
  11: main
  12: __libc_start_main
  13: _start


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-10170-1695754089-aptos-node-v1-5-1","timestamp":"2023-09-26T19:15:08.151195Z","message":"Deleting namespace forge-framework-upgrade-pr-10170: Some(NamespaceStatus { conditions: Some([NamespaceCondition { last_transition_time: Some(Time(2023-09-26T19:15:07Z)), message: Some(\"All resources successfully discovered\"), reason: Some(\"ResourcesDiscovered\"), status: \"False\", type_: \"NamespaceDeletionDiscoveryFailure\" }, NamespaceCondition { last_transition_time: Some(Time(2023-09-26T19:15:07Z)), message: Some(\"All legacy kube types successfully parsed\"), reason: Some(\"ParsedGroupVersions\"), status: \"False\", type_: \"NamespaceDeletionGroupVersionParsingFailure\" }, NamespaceCondition { last_transition_time: Some(Time(2023-09-26T19:15:07Z)), message: Some(\"All content successfully deleted, may be waiting on finalization\"), reason: Some(\"ContentDeleted\"), status: \"False\", type_: \"NamespaceDeletionContentFailure\" }, NamespaceCondition { last_transition_time: Some(Time(2023-09-26T19:15:07Z)), message: Some(\"Some resources are remaining: persistentvolumeclaims. has 4 resource instances\"), reason: Some(\"SomeResourcesRemain\"), status: \"True\", type_: \"NamespaceContentRemaining\" }, NamespaceCondition { last_transition_time: Some(Time(2023-09-26T19:15:07Z)), message: Some(\"Some content in the namespace has finalizers remaining: kubernetes.io/pvc-protection in 4 resource instances\"), reason: Some(\"SomeFinalizersRemain\"), status: \"True\", type_: \"NamespaceFinalizersRemaining\" }]), phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-10170-1695754089-aptos-node-v1-5-1","timestamp":"2023-09-26T19:15:08.151246Z","message":"aptos-node resources for Forge removed in namespace: forge-framework-upgrade-pr-10170"}

failures:
    framework_upgrade::framework-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed

Stack backtrace:
   0: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:618:13
   1: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:410:11
      forge::main
             at ./testsuite/forge-cli/src/main.rs:336:21
   2: core::ops::function::FnOnce::call_once
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135:18
   3: std::rt::lang_start::{{closure}}
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:166:18
   4: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:284:13
      std::panicking::try::do_call
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
      std::panicking::try
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:48
      std::panicking::try::do_call
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
      std::panicking::try
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:20
   5: main
   6: __libc_start_main
   7: _start
Debugging output:

Poytr1 pushed a commit to sentioxyz/aptos-core that referenced this pull request Oct 4, 2023
Poytr1 added a commit to sentioxyz/aptos-core that referenced this pull request Oct 4, 2023
* Update system-integrators-guide.md (aptos-labs#10087)

The old url file doesn't exist anymore. must be changed to the new one.

* [dashboards] sync grafana dashboards

* [typo*] (aptos-labs#9912)

* [typo*]Update delegation-pool-operations.md

* [typo*]Update run-a-fullnode-on-gcp.md

* [typo*]Update run-a-fullnode-on-gcp.md

* [typo*]Update glossary.md

* [Spec] Ensures for stake.move (aptos-labs#9700)

* 1

* cannot finish hp

* remove some wrong statements

* hp1-3

* rewrite hp2

* rewrite hp2 again

* hp1-3

* init

* fix

* fix ensures

* fix ensure in the update_sat

* fix comment

* fix md

* fix new comment

* update comment

* fix indent

* fix timeout

* fix timeout

* fix timeout

---------

Co-authored-by: chan-bing <zzywullr@gmail.com>

* Update fullnode-source-code-or-docker.md (aptos-labs#9914)

* [typo*]Update index.md (aptos-labs#9913)

* [typo*]Update index.md

* Update index.md

---------

Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com>

* [link*Update fullnode-source-code-or-docker.md (aptos-labs#9978)

* Update aptos-bitvec/src/lib.rs (aptos-labs#10045)

There is a mistake in the comment

* Update delegation-pool-operations.md (aptos-labs#10171)

remove deprecated function

* trivial: fix rocksdb property reporter

* forge: ability to override resource ask in Rust

* [CLI] Add txn stream to local testnet (aptos-labs#10101)

* [dashboards] sync grafana dashboards

* [TS SDK V2] Add `General` api class (aptos-labs#10185)

* add general api queries

* add general api queries

* [move] make vm clonable

* Pass down resolver to MoveVmExt::new()

In preparation for caching a warm VM

* Move VM warm up into adapter

* WarmVmCache

* Fix circular dependency

* [object] refactor burn/unburn (aptos-labs#9785)

* Fix Issue 9717 by breaking critical edges in CFG in move-compiler (v1) (aptos-labs#10064)

Fix issue aptos-labs#9717 by breaking critical edges in the CFG in move-compiler (v1).

While issue 9717 mentions "inline", this is a red herring as inlining a vector loop just makes it more likely to have a reference parameter on the stack, which leads to a failure to correctly drop dead references in move-compiler/src/cfgir/liveness/mod.rs. I've left the initial test case bug_9717.move and added bug_9717_looponly.move that just has a loop illustrating the problem, along with several variants on the loop (one of them, break2, also exhibiting the problem).

Anyway, the failure to properly place the reference drop happens when the constructed CFG has critical edges (edges from a node with multiple outgoing edges to a node with multiple ingoing edges). Such a situation is well-known in compiler literature to make it difficult to place instructions precisely based on certain analyses.

Fortunately, this seems to be easily fixed by adding a small pass to add a node on each such critical edge, so that the drop can be placed properly even in the case of a break.

Unfortunately, later passes can't deal with the resulting deep expression trees and chains of direct jumps that result in some cases, so I had to fix hlir/translate.rs to reduce stack depth for a given expression, and cfgir/optimize/inline_blocks.rs to properly remove the unneeded jumps.

New tests also reveal misplaced warnings/errors about unused variables in the presence of inlining, so that was also fixed.

* [Doc] Update build e2e dapp docs (aptos-labs#10165)

* Update build e2e dapp docs

* update js -> jsx

* update

* Update developer-docs-site/docs/tutorials/build-e2e-dapp/4-fetch-data-from-chain.md

* f

* f

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>

* implement memoize high order function (aptos-labs#10110)

* [Executor] Metadata and exists support (in Block-STM/executor) (aptos-labs#10170)

* [move-vm] Refactor Loader (aptos-labs#9320)

* Rename variant

* [move-vm] Store ability in runtime type

* fixup! [move-vm] Store ability in runtime type

* fixup! fixup! [move-vm] Store ability in runtime type

* fixup! fixup! fixup! [move-vm] Store ability in runtime type

* [move_vm] Split loader into multiple files

* [move_vm] Cache struct type for modules

* [move_vm] Move depth cache to type cache

* [move-vm] Use name to replace type index

* [move-vm] Remove function index

* [move-vm] Inline functions

* [move-vm] Remove cached index from definition

* [move-vm] Check cross module linking before loading

* [move-vm] Remove global struct cache

* [move-vm] Inline struct name to avoid excessive memory allocation

* [move-vm] Split function in loader

* [move-vm] Split out module cache

* [e2e-test] Add randomized test for loader

* Fix Lint

* [move-vm] Cache signature resolution

* [move-vm] Removed unneeded signature token

* [move-vm] Arc-ed type argument

* Fix Zekun's comments

* fixup! [e2e-test] Add randomized test for loader

* add comments for the test strategy

* Rename struct name

* More renaming

* Addressing more comments

* [consensus] Dedicated channel for proposal buffering and batch process

* revert 9ed1da8 (aptos-labs#10256)

* update move run to include more params (aptos-labs#9932)

* [CLI] Update CLI binary doc site to include openssl3 instruction (aptos-labs#9964)

* Update CLI binary doc site to include openssl3 instruction

* Update developer-docs-site/docs/tools/aptos-cli/install-cli/download-cli-binaries.md

Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com>

* Update developer-docs-site/docs/tools/aptos-cli/install-cli/download-cli-binaries.md

Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com>

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com>

* [dag] split notifier into Order and Proof Notifier

[dag] additional ledger info verification checks

[dag] separate out highest committed round provider

[dag] introduce a ledger info provider trait

* [CLI] Fix faucet component of local testnet without --force-restart (aptos-labs#10214)

* Update Docker images (aptos-labs#10208)

Co-authored-by: gedigi <gedigi@users.noreply.github.com>

* [ts-sdk-v2] Add .npmrc to ensure pnpm lockfile is consistent (aptos-labs#10251)

* Update dependabot.yml to disable version update PRs (aptos-labs#10249)

* CLI version bump to 2.1.1 (aptos-labs#10272)

* Get Drand Move example to compile (aptos-labs#10196)

* change names of drand and veiledcoin

* drand compiles

* drand tests compile

* remove diff.txt

* add retrieve lottery winner function

* Update CLI dry run text (aptos-labs#10273)

* add partial transaction queries (aptos-labs#10275)

* [TSSDKv2][1/n] add Ed25519 classes (aptos-labs#10157)

* add PrivateKey, PublicKey, and Signature classes

* Add helper.ts and fixes based on comments

* add asymmetric_crypto file which includes all crypto base classes as abstract. And have all concrete crypto classes to extend from based abstract classes

* Update crypto classes name to include Ed25519 prefix

* [TS SDK V2] Add Indexer account queries (aptos-labs#10216)

* implement account indexer queries

* add indexer account api queries

* address feedback

* [TSSDKv2][3/n] Account and AuthenticationKey Classes (aptos-labs#10210)

* Add helper.ts and fixes based on comments

* add asymmetric_crypto file which includes all crypto base classes as abstract. And have all concrete crypto classes to extend from based abstract classes

* Update crypto classes name to include Ed25519 prefix

* fixes based on comments

* Add account and auth key classes

* update Account class to accept abstract PublicKey and PrivateKey

* update methods' comment

* [ts-sdk-v2] Add getTransactions to the sdk-v2 transaction API (aptos-labs#10288)

* Add getTransactions to the sdkv2 transaction API

* Update ecosystem/typescript/sdk_v2/src/internal/transaction.ts

Co-authored-by: Maayan <maayan@aptoslabs.com>

* update after merge

---------

Co-authored-by: Greg Nazario <greg@gnazar.io>
Co-authored-by: Maayan <maayan@aptoslabs.com>

* [aptos-stdlib] Cleanup error codes for divide by 0

* [marketplace-example] Fix royalties edge conditions

There were two bugs with royalties and listings.  One was that in
v1, royalties could have a denominator 0, or be greater than 100%
in legacy NFTs.  This means that it's possible that payouts don't
work correctly, or take more than expected.  Now, royalties are
bounded to 0-100%.

The second issue was that commission was taken after royalties,
but didn't consider that royalties could be 100%.  Now, royalties
are taken first, and commission is taken out of the remainder.
This does mean the marketplace may not have any commission if the
royalties are set to 100%.

* [Spec] Fix spec (aptos-labs#10215)

* fix spec

* staking_contract spec

* [Rust] Upgrade to Rust version 1.72.1

* [Forge][Chaos] remove jitter, make inter-region BW 300 Mbps (aptos-labs#10277)

### Description

The changes are based on observations while measuring network performance and reading more into the dataset used.
* Jitter: My understanding is that re-ordering of packets should be pretty rare in the real world, while the previous jitter configs would introduce re-ordering quite frequently. Unless we have a strong belief that jitter is present in our networks, we shouldn't mess with this.
* Inter-region bitrate: The numbers this was based on are iperf with a single TCP stream. The results correlate strongly with RTT, which suggests that RTT is the limiting factor, so there's no real reason to constrain BW itself. For now, 300 Mbps is as fast as our network stack will go for 100+ ms RTT.

* [Forge][PFN] remove epoch changes from some tests (aptos-labs#10278)

### Description

This is in an effort to reduce noise in the PFN tests. Epoch changes are an obvious source of noise, although it's unclear how much it contributes to noise difference between runs.

To still have coverage of epoch changes, the tests without chaos will still do 5 minute epoch changes.

### Test Plan

Run ad-hoc forge run, observe no epoch changes.

* [Forge][netbench] split into large and small messages for two region test (aptos-labs#10283)

### Description

We found that the throughput for large and small messages is very different with large latencies. We update the test to run with large messages and then split out a small messages test. The large messages are useful for sanity checking the network setup, the small messages is something that we can hopefully improve upon.

* [crypto] add secp256k1 support to aptos-crypto

Pretty straight forward, but I think our crypto apis are really not
great. The amount of traits that should really be compressed down into a
single trait for PublicKey, PrivateKey, and Signature. This would make
maintaining and adding new libraries so much easier.

* [types] Add support for secp256k1 authenticator

with this we can now send secp256k1 signed transactions to the
blockchain...

I'm going to do some code refactoring in authenticators and transactions
before resuming the end-to-end testing and the feature gating of this feature.

* [openapi] update openapi with secp256k1

* [types] remove AuthenticationKeyPreimage

This was adding extra code and adding complexity to what is already a
complex space. If we aren't going to use the preimage, we have no need
to write the code.

We need to be more dilligent about removing this type of unnecessary
code from the codebase, because it really impairs our ability to move
fast.

* [types] remove prefix and rename derived_address on AuthenticationKey

* AuthenticationKey and Address are 1:1, so all this code is legacy
  based upon some weird goal of trying to compress the account address
  into an insecure size back in Libra. Even the authors of this code
  have since moved to 32-bytes in their own blockchain.
* prefix is never used and removed.
* derived_address -> account_address because there's no derivation it is
  literally 1:1

* [api] end to end test for secp256k1 ecdsa

* [features] add secp256k1 ecdsa

* [Aptos Data Poller] Improve peer polling logic.

* [Aptos Data Client] Update tests for new poller.

* [exp] make it work for any upstream repo (aptos-labs#10016)

* [indexer-grpc] k6 loadtest (aptos-labs#9493)

* [Aptos Data Client] Improve selection for optimistic fetch and
subscriptions.

* [Aptos Data Client] Add new tests for peer selection logic.

* Make InMemoryStateCalculatorV2 work with state sync chunks. (aptos-labs#10263)

* [release-builder] Increase lockup before executing proposals

Increase the lockup before executing proposals

Currently the release flow started consistently failing because we need to increase the lockup

This ensures that the lockup is sufficient before executing transactions

Test Plan: not sure how to test other than by running against testnet???

Please advise

* [Network] Replace RwLock with ArcSwap for trusted peers.

* [Network] Reduce lock contention for peer metadata using cache.

* replay-verify.yaml not reference workflows by @main

* replay-verify: not cancel other sub-jobs on first failure

* recalibrate single node benchmark for perf regression aptos-labs#10298

* [move-model] fix internal assertion violation in definition analysis (aptos-labs#10292)

Co-authored-by: Aalok Thakkar <aalok@Aaloks-MacBook-Pro.local>

* [ts-sdk-v2] Rename endpoint to path in client

* [ts-sdk-v2] Rename originMethod to name for API requests

* [ts-sdk-v2] Simplify fullnode get requests

* [ts-sdk-v2] Make Post requests simpler

* [ts-sdk-v2] Update format and lint for SDK

This fixes most of the SDK lints except for the pieces around
the static functions for deserialize.

* [ts-sdk-v2] Fix broken tests

* [ts-sdk-v2] Replace "Generic error"

* [ts-sdk-v2] Add derivation path invalid test

* [ts-sdk-v2] Cleanup Account and crypto for code reuse

1. Allows deriving public key from private key
2. Detaches accounts from Single Ed25519
3. Adds some consistency to input naming

* [ts-sdk-v2] Unify authentication key creation

* [ts-sdk-v2] Add Authentication Key scheme enum

* [ts-sdk-v2] Add ability to derive authkeys from non-key schemes

* [ts-sdk-v2] Add docs to AuthenticationKey

* [ts-sdk-v2] Cleanup documentation on Ed25519

* [ts-sdk-v2] Add documentation to asymmetric crypto

* [ts-sdk-v2] Add docs, cleanup multi-ed25519

* [ts-sdk-v2] Move paginate with cursor to client

* [ts-sdk-v2] Cleanup unused lint ignores

* [ts-sdk-v2] Authentication key testing and message improvements

* [ts-sdk-v2] Rename some client inputs and add documentation

* [dag] epoch manager integration; dag is here

* [dashboards] sync grafana dashboards

* fix WarmVmCache

Natives are stateful so must be covered by the WarmVmId.

1. add TimedFeaturesBuilder and convert TimedFeatures to an array of
   booleans.
2. add SafeNativeBulder::id_bytes() to fix the bug
3. rebuild vm (and hence all the natives) only when the builder id_bytes() change.

* use s5cmd for downloading files in replay-verify

official aswcli experiencing random crashes

* [e2e-tests] Fix compilation error (aptos-labs#10318)

* Drop frozen root after make checkpoint. (aptos-labs#10327)

* redistribute mainnet replay sub-job ranges

* [TS-SDK v2] Updating the `Deserializable` interface and making `Serializable` an abstract class (aptos-labs#10307)

* Removing export from Deserializable to facilitate using a static `deserialize` method, fixing error messages in unit tests for multi_ed25519, and changing Serializable to an abstract class that implements `bcsToBytes()`. Removed abstract deserialize from public/private key classes.

* Re-adding doc comments

* Adding `serialize` and `deserialize` and corresponding unit tests to the AccountAddress class

* Fixing multi_ed25519 error messages

* Updating doc comments for Deserializable to clarify what its purpose is.

* [move unit tests] Run extended checker as part of unit tests (aptos-labs#10309)

* [move unit tests] Run extended checker as part of unit tests

Closes aptos-labs#9251

This runs the extended checker as part of Aptos unit tests (either our own Rust integrated tests or from the CLI). It uses the same technique as we already used for native extensions specific to Aptos: a hook is defined where additional, move-model based validations can be run. This is hook is then connected to the extended checker when running Aptos tests.

The implementation also optimizes the construction of the move model: if that one is already needed by abi generation (which is the default), it is not constructed a 2nd time for the extended checker -- both  for the existing build step and the new test step. This should avoid one full additional compilation (source -> bytecode -> model run).

* Extended checks until now excluded test code, leading to wrong usage of entry functions and attributes marked as test-only. Because fixing this is a breaking change, this commit adds the behavior to check test code via a new CLI option `--check-test-code`. This flag should eventually become default behavior.

Also fixes some reviewer comments.

* implement forge links to axiom (aptos-labs#10330)

* [gas-calibration][simple] ignore terms with 0-coefficients (aptos-labs#9742)

* [indexer][api] update the metrcis for api gateway consumption. (aptos-labs#10322)

* [dag] add various counters

* [TS-SDK v2] Adding `serializeVector` and `deserializeVector` to SDK v2 (aptos-labs#10347)

* Adding `serializeVector` and `deserializeVector` to serializer.ts and deserializer.ts as well as a unit test for each

* Adding documentation for each function

* Removing redundant line in doc comment

* Removing redundant line in deserializer doc comment too

* Allow using skip_index_and_usage on state sync code path. (aptos-labs#10303)

* [dag] add a few structured logging

* jin_fix_ed25519_derive_publickey (aptos-labs#10357)

* [ts-sdk-v2] Add MIME types, and convert input types from string to Enum (aptos-labs#10308)

* Remove expensive counters (aptos-labs#10188)

* [TS SDK V2] Port over transaction types (aptos-labs#10364)

* transaction types

* address comments

* [CLI] Restructure local testnet code (aptos-labs#10252)

* [release-builder] Fix increase lockup

The previous PR was converting the private key arg to string incorrectly...

ED25519 key type has a very silly to_string that returns debug output

I suppose this is to prevent dumping the key, but really i think it should just
not have a to_string and instead only have a debug method that does the same
thing, and a very explicit to hex method so that you dont accidentally dump these.

Test Plan: framework upgrade test succeeds, or at least doesnt fail on this
error (there is still some underlying flakiness in the forge test itself)

* [Sharded-Execution] Fix a race condition while fetching the state values on a shard from a remote stateview (aptos-labs#10320)

[Sharded-Execution] Fix a race condition while fetching the state values on a shard from a remote stateview

* [SDKv2] derive publickey unit test (aptos-labs#10358)

* jin_fix_ed25519_derive_publickey

* Add unit test for publicKey() derivate method

* [framework] Turn on test-only checking and fix errors (aptos-labs#10368)

This turns on running extended checks also on test-only code in the framework unit tests. A few errors discovered this way are fixed.

Changes to function declarations in this PR do not effect compatibility because only test-only functions are effected which are stripped before deployment.

Related to aptos-labs#10335, but more needs to be done to make this behavior the default. This cannot happen before the next framework release.

* Add CI to check for banned CLI deps (aptos-labs#10338)

* refactor: replace multiply_then_divide using math64::mul_div (aptos-labs#10047)

Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>

* [dag] hardening message verification

* [TS SDK V2] Port over account and transaction authenticator, signed transaction type (aptos-labs#10367)

* transaction types

* address comments

* authenticators

* previewnet flow in the benchmark (aptos-labs#10305)

* [ts sdk-v2] Mini get/post request refactor (aptos-labs#10380)

* [Sharded-Execution-GRPC] Add GRPC communication for sharded execution. (aptos-labs#10274)

[Sharded-Execution-GRPC] Add GRPC communication for sharded execution

In this commit we replace the existing socket based communication (that is message send and message recv) with GRPC.
Here we get the basic GRPC reliability.

More reliability and better performance to come in subsequent commits

* [forge] Fix fullnode override in forge (aptos-labs#10382)

### Description

Fixing a mistake made in a previous PR, that made fullnode configs not apply (and overwrite validator configs)

### Test Plan

Ran a fullnode test and manually check the config that is logged.

* Update delegation-pool-operations.md (aptos-labs#10299)

reorganize page

* [indexer grpc] update the metrics for usage analysis. (aptos-labs#10383)

* Update staking-pool-operations.md (aptos-labs#10300)

* Update staking-pool-operations.md

reorganize page

* Update staking-pool-operations.md

add step for owner account

* Update staking-pool-operations.md

* Update staking-pool-operations.md

remove heading

---------

Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com>

* [Storage][Pruner] Set min_readable_version and metrics. (aptos-labs#10381)

* rename enum variants and transaction arguments (aptos-labs#10384)

* [indexer grpc] More fields to short connection metric (aptos-labs#10385)

* [Tutorials] Replacing the old "Your First NFT" tutorial entry function calls to use aptos_token.move and new indexer queries (aptos-labs#9424)

* Overwriting the old your-first-nft tutorial in the docs to replace the 0x3 contract calls with the aptos_token contract calls. Also added corresponding documentation tags to referenced indexer queries and the new token client calls.

* Changing simple-aptos-token.py to simple_aptos_token.py and updating the Makefile to include it as an example

* Changing `create_token` to `mint_token` in python tutorial and `createToken` in typescript to `mint`

* Updating typescript example to work correctly if indexer/fullnode chainId aren't in sync

* [Python SDK] Use AccountAddress.from_str_relaxed when parsing addresses from the node API

---------

Co-authored-by: rtmtree <36580295+rtmtree@users.noreply.github.com>
Co-authored-by: rustielin <rustielin@users.noreply.github.com>
Co-authored-by: Vladislav ~ cryptomolot <88001005+cryptomolot@users.noreply.github.com>
Co-authored-by: Zorrot Chen <UI_Zorrot@163.com>
Co-authored-by: chan-bing <zzywullr@gmail.com>
Co-authored-by: Freesson ~ cryptomolot <semenikhinas@gmail.com>
Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com>
Co-authored-by: Jiege <stevekeol.x@gmail.com>
Co-authored-by: michelle-aptos <120680608+michelle-aptos@users.noreply.github.com>
Co-authored-by: aldenhu <msmouse@gmail.com>
Co-authored-by: Daniel Porteous (dport) <daniel@dport.me>
Co-authored-by: Maayan <maayan@aptoslabs.com>
Co-authored-by: Zekun Li <li.zekun@gmail.com>
Co-authored-by: Aaron <lightmark@gmail.com>
Co-authored-by: Brian R. Murphy <132495859+brmataptos@users.noreply.github.com>
Co-authored-by: Oliver He <heliuchuan@gmail.com>
Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: Rati Gelashvili <gelash@users.noreply.github.com>
Co-authored-by: runtianz <runtian@aptoslabs.com>
Co-authored-by: Jin <128556004+0xjinn@users.noreply.github.com>
Co-authored-by: Balaji Arun <balajia@vt.edu>
Co-authored-by: Gerardo Di Giacomo <gerardo@aptoslabs.com>
Co-authored-by: gedigi <gedigi@users.noreply.github.com>
Co-authored-by: Greg Nazario <greg@gnazar.io>
Co-authored-by: Michael Straka <mstraka100@gmail.com>
Co-authored-by: Teng Zhang <rahxephon89@163.com>
Co-authored-by: Josh Lind <josh.lind@hotmail.com>
Co-authored-by: Brian (Sunghoon) Cho <brian@aptoslabs.com>
Co-authored-by: Rustie Lin <rustie117@gmail.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
Co-authored-by: Perry Randall <perryjrandall@gmail.com>
Co-authored-by: igor-aptos <110557261+igor-aptos@users.noreply.github.com>
Co-authored-by: aalok-t <140445856+aalok-t@users.noreply.github.com>
Co-authored-by: Aalok Thakkar <aalok@Aaloks-MacBook-Pro.local>
Co-authored-by: Matt <90358481+xbtmatt@users.noreply.github.com>
Co-authored-by: Wolfgang Grieskamp <wg@aptoslabs.com>
Co-authored-by: Christian Theilemann <christian@aptoslabs.com>
Co-authored-by: Victor Gao <10379359+vgao1996@users.noreply.github.com>
Co-authored-by: larry-aptos <112209412+larry-aptos@users.noreply.github.com>
Co-authored-by: Sital Kedia <sitalkedia@users.noreply.github.com>
Co-authored-by: Manu Dhundi <manudhundi@gmail.com>
Co-authored-by: 0xbe1 <0xbetrue@gmail.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
fEst1ck pushed a commit to fEst1ck/aptos-core that referenced this pull request Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants