Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few counters for BlockSTM size #12581

Merged
merged 1 commit into from
Mar 19, 2024
Merged

Conversation

igor-aptos
Copy link
Contributor

@igor-aptos igor-aptos commented Mar 18, 2024

Description

Type of Change

  • New feature

Which Components or Systems Does This Change Impact?

  • Move/Aptos Virtual Machine

How Has This Been Tested?

run forge tests, and looked at the counters

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Mar 18, 2024

⏱️ 17h 31m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-coverage 4h 22m 🟩
rust-smoke-coverage 3h 46m 🟩
rust-unit-tests 2h 14m 🟩🟩🟩🟩🟩
windows-build 1h 48m 🟩🟩🟩🟩🟩
execution-performance / single-node-performance 1h 27m 🟩🟩🟩🟩
rust-images / rust-all 56m 🟩🟩🟩🟩
rust-lints 44m 🟩🟩🟩🟩🟩
rust-smoke-tests 32m 🟩
check 21m 🟩🟩🟩🟩🟩
run-tests-main-branch 19m 🟥🟥🟥🟥🟥
forge-e2e-test / forge 15m 🟩
forge-compat-test / forge 13m 🟩
general-lints 12m 🟩🟩🟩🟩🟩
check-dynamic-deps 7m 🟩🟩🟩🟩🟩
cli-e2e-tests / run-cli-tests 6m 🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
file_change_determinator 1m 🟩🟩🟩🟩🟩
file_change_determinator 59s 🟩🟩🟩🟩🟩
node-api-compatibility-tests / node-api-compatibility-tests 53s 🟩
file_change_determinator 48s 🟩🟩🟩🟩
execution-performance / file_change_determinator 42s 🟩🟩🟩🟩
permission-check 25s 🟩🟩🟩🟩🟩
permission-check 25s 🟩🟩🟩🟩🟩
permission-check 19s 🟩🟩🟩🟩🟩
permission-check 18s 🟩🟩🟩🟩🟩
permission-check 15s 🟩🟩🟩🟩
upload-to-codecov 14s 🟩
determine-docker-build-metadata 12s 🟩🟩🟩🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
windows-build 24m 20m +21%

settingsfeedbackdocs ⋅ learn more about trunk.io

@@ -244,6 +244,13 @@ impl DelayedFieldValue {
},
})
}

pub fn get_approx_memory_size(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not spell fully, get_approximate_memory_size?


pub fn get_approx_memory_size(&self) -> usize {
match &self {
DelayedFieldValue::Aggregator(_) | DelayedFieldValue::Snapshot(_) => 1 + 16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add an explantation here? If I see this code for the first time, comment would be super helpful to explain what this even mean.

pub static BLOCK_VIEW_BASE_VALUES_MEMORY_USAGE: Lazy<HistogramVec> = Lazy::new(|| {
register_avg_counter_vec(
"aptos_execution_block_view_base_values_memory_usage",
"Size (number of keys) ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is wrong? Should be values?

name,
desc,
labels,
// We need to have at least one bucket in histogram, otherwise default buckets are used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: . in the end of a comment

next_idx_to_commit: AtomicTxnIndex,
next_idx_to_commit: CachePadded<AtomicTxnIndex>,

total_base_value_size: CachePadded<AtomicU64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make this cache padded to ensure so that both counters are not colocated on the same cache line, and we avoid invalidations? But what about values and next_idx_to_commit. Is the effect actually observable here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sync up with @gelash when he is back. for now this makes new counter not conflict, doesn't improve on existing conflicts

.entry(id)
.or_insert(VersionedValue::new(Some(base_value)));
use dashmap::mapref::entry::Entry::*;
match self.values.entry(id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if let Vacant(...) = ... { ... }? Or even or_insert_with actually

@@ -1272,6 +1275,8 @@ where

ret.resize_with(num_txns, E::Output::skip_output);

counters::update_state_counters(unsync_map.stats(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value should be false, right? Because it is sequential?

@igor-aptos igor-aptos added the CICD:run-execution-performance-test Run execution performance test label Mar 19, 2024
@igor-aptos igor-aptos force-pushed the igor/block_stm_mem_counters branch from 6203413 to b9bf67b Compare March 19, 2024 17:31
@igor-aptos igor-aptos requested a review from vusirikala March 19, 2024 17:34
@igor-aptos igor-aptos force-pushed the igor/block_stm_mem_counters branch from b9bf67b to c807df6 Compare March 19, 2024 17:58

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.9.5 ==> c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52

Compatibility test results for aptos-node-v1.9.5 ==> c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52 (PR)
1. Check liveness of validators at old version: aptos-node-v1.9.5
compatibility::simple-validator-upgrade::liveness-check : committed: 5766 txn/s, latency: 5156 ms, (p50: 4800 ms, p90: 8100 ms, p99: 16000 ms), latency samples: 224900
2. Upgrading first Validator to new version: c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 713 txn/s, latency: 34520 ms, (p50: 37800 ms, p90: 51900 ms, p99: 54800 ms), latency samples: 58520
3. Upgrading rest of first batch to new version: c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 416 txn/s, submitted: 609 txn/s, expired: 193 txn/s, latency: 37000 ms, (p50: 45100 ms, p90: 56400 ms, p99: 57800 ms), latency samples: 37039
4. upgrading second batch to new version: c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 1477 txn/s, latency: 19912 ms, (p50: 21600 ms, p90: 27600 ms, p99: 28000 ms), latency samples: 66500
5. check swarm health
Compatibility test for aptos-node-v1.9.5 ==> c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on c807df6dfa3d1a4b91dbcc61a0ba6ddb635dbb52

two traffics test: inner traffic : committed: 7397 txn/s, latency: 5303 ms, (p50: 5100 ms, p90: 6000 ms, p99: 11700 ms), latency samples: 3195580
two traffics test : committed: 100 txn/s, latency: 1911 ms, (p50: 1800 ms, p90: 2100 ms, p99: 6500 ms), latency samples: 1800
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.231, avg: 0.203", "QsPosToProposal: max: 0.337, avg: 0.289", "ConsensusProposalToOrdered: max: 0.478, avg: 0.441", "ConsensusOrderedToCommit: max: 0.329, avg: 0.314", "ConsensusProposalToCommit: max: 0.783, avg: 0.755"]
Max round gap was 1 [limit 4] at version 1445471. Max no progress secs was 4.373785 [limit 15] at version 1445471.
Test Ok

@igor-aptos igor-aptos merged commit bb3950a into main Mar 19, 2024
83 checks passed
@igor-aptos igor-aptos deleted the igor/block_stm_mem_counters branch March 19, 2024 20:29
igor-aptos added a commit that referenced this pull request Mar 21, 2024
#12581) (#12594)

* Optimize BlockSTM memory usage for delayed fields (#12580)

This reduces memory footprint of amortized usage of delayed field with a single version from 3000 bytes to 600 bytes per delayed field.
Basically BTreeMap stores things in vectors of 5, and size of the value is large. This invalidates BTreeMap's cache optimization of storing them all together.

* Limit number of DelayedFields (aggregators) per resource

temporarily, until gas charges for aggregator loading are implemented, limit number of aggregators per resource

* Add a few counters for BlockSTM size (#12581)

* update TOO_MANY_DELAYED_FIELDS error code (#12603)

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-execution-performance-test Run execution performance test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants