Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

short-term fix for para inherent weight overestimation #5082

Merged
merged 32 commits into from
Aug 29, 2024

Conversation

ordian
Copy link
Member

@ordian ordian commented Jul 19, 2024

closes #849

Context

For the background on this and the long-term fix, see #849 (comment).

Changes

  • The weigh files are renamed from runtime_(parachains|common).* to polkadot_runtime_(parachains|common).*. The reason for it is the renaming introduced in Unify dependency aliases #4633. The new weight command and files are generated now include polkadot_ prefix.
  • The WeightInfo for paras_inherent now includes enter_empty which calculates the cost of processing an empty parachains inherent. This cost is subtracted dynamically when calculating other weights (so the other weights remain the same)

Benefits

See #849 (comment), but TL;DR is that we are not blocked on weights for scaling the number of validators and cores further.

Resolved questions:

TODOs:

  • Rerun benchmarks for Rococo and Westend
  • PRDoc

@ordian ordian added T8-polkadot This PR/Issue is related to/affects the Polkadot network. T12-benchmarks This PR/Issue is related to benchmarking and weights. labels Jul 19, 2024
@ordian
Copy link
Member Author

ordian commented Jul 22, 2024

bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=westend
bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=rococo
bot clean

@command-bot command-bot bot deleted a comment from github-actions bot Jul 22, 2024
command-bot and others added 6 commits July 22, 2024 14:58
…=rococo --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent
…=westend --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent
* master:
  Bump slotmap from 1.0.6 to 1.0.7 (#5096)
  feat: introduce pallet-parameters to Westend to parameterize inflation (#4938)
  Bump openssl from 0.10.64 to 0.10.66 (#5107)
  Bump lycheeverse/lychee-action from 1.9.1 to 1.10.0 (#5094)
  docs: remove the duplicate word (#5095)
  Prepare PVFs if node is a validator in the next session (#4791)
  Update parity publish (#5105)
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6778934

@kianenigma kianenigma added the T2-pallets This PR/Issue is related to a particular pallet. label Jul 24, 2024
@kianenigma
Copy link
Contributor

What is the long term fix?

* master: (27 commits)
  Bridges improved tests and nits (#5128)
  Fix misleading comment about RewardHandler in epm config (#3095)
  Introduce a workflow updating the wishlist leaderboards (#5085)
  membership: Restructure pallet into separate files (#4536)
  Fix after ring-proof api change (#5126)
  Bump paritytech/review-bot from 2.4.0 to 2.5.0 (#5057)
  Bump docker/login-action from 3.0.0 to 3.3.0 (#5109)
  Bump docker/build-push-action from 5.1.0 to 6.5.0 (#5108)
  Bump peter-evans/create-pull-request from 5.0.0 to 6.1.0 (#5093)
  Tx Payment: drop ED requirements for tx payments with exchangeable asset  (#4488)
  Remove `pallet-getter` usage from pallet-transaction-payment (#4970)
  pallet macro: do not generate try-runtime related code when frame-support doesn't have try-runtime. (#5099)
  fix(chain-spec): ChainSpecBuilder with object as default genesis (#4345)
  Migrate BEEFY BLS crypto to  bls12-381 curve (#4931)
  Bump clap from 4.5.9 to 4.5.10 in the known_good_semver group (#5120)
  Use jobserver in wasm-builder to limit concurrency of spawned cargo processes (#4946)
  include events for voting (#4613)
  [subsystem-bench] Add mocks for own assignments triggering (#5042)
  Remove not-audited warning (#5114)
  hotfix: blockchain/backend: Skip genesis leaf to unblock syncing (#5103)
  ...
Comment on lines +35 to +37
benchmark.bitfields.clear();
benchmark.backed_candidates.clear();
benchmark.disputes.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these really needed if we just called build() on a default BenchBuilder instance?

* master: (51 commits)
  Remove unused feature gated code from the minimal template (#5237)
  make polkadot-parachain startup errors pretty (#5214)
  Coretime auto-renew (#4424)
  network/strategy: Backoff and ban overloaded peers to avoid submitting the same request multiple times (#5029)
  Fix frame crate usage doc (#5222)
  beefy: Tolerate pruned state on runtime API call (#5197)
  rpc: Enable ChainSpec for polkadot-parachain (#5205)
  Add an adapter for configuring AssetExchanger (#5130)
  Replace env_logger with sp_tracing (#5065)
  Adjust sync templates flow to use new release branch (#5182)
  litep2p/discovery: Publish authority records with external addresses only (#5176)
  Run UI tests in CI for some other crates (#5167)
  Remove `pallet::getter` usage from the pallet-balances (#4967)
  pallet-timestamp: `UnixTime::now` implementation logs error only if called at genesis (#5055)
  [CI] Cache try-runtime check (#5179)
  [Backport] version bumps and the prdocs reordering from stable2407 (#5178)
  [subsystem-benchmark] Update availability-distribution-regression-bench baseline after recent subsystem changes (#5180)
  Remove pallet::getter usage from proxy (#4963)
  Remove pallet::getter macro usage from pallet-election-provider-multi-phase (#4487)
  Review-bot@2.6.0 (#5177)
  ...
@ordian
Copy link
Member Author

ordian commented Aug 7, 2024

bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=westend
bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=rococo
bot clean

@command-bot command-bot bot deleted a comment from github-actions bot Aug 7, 2024
* master:
  Umbrella crate: exclude chain-specific crates (#5173)
  Bring reference_hardware.json inline with machine used for weights (#5196)
  Snowbridge on Westend (#5074)
  Run semver check even when no prdoc (#5189)
  Export more from sc-service (#5250)
  Update the wishlist leaderboard script to handle PRs (#5256)
@ordian ordian marked this pull request as ready for review August 7, 2024 16:15
Copy link
Contributor

@alexggh alexggh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but there is something I don't clearly understand, as far as I can tell we are using fallback_max_validators() in a lot of places, don't we need to adjust those to get realistic results for 1000 validators ?

* master: (35 commits)
  Fix OurViewChange small race (#5356)
  Make ticket non-optional and add ensure_successful method to Consideration trait (#5359)
  [tests] dedup test code, add more tests, improve naming and docs (#5338)
  Stop running the wishlist workflow on forks (#5297)
  Migrate foreign assets v3::Location to v4::Location (#4129)
  Minor clean up (#5284)
  [Pools] Ensure members can always exit the pool gracefully (#4998)
  StorageWeightReclaim: set to node pov size if higher (#5281)
  [Bot] Add prdoc generation (#5331)
  Small nits found accidentally along the way (#5341)
  Create subsystem-benchmarks.yml (#5325)
  Bump libp2p-identity from 0.2.8 to 0.2.9 (#5232)
  Bump authoring duration for async backing to 2s. (#5195)
  Fix spelling issues (#5206)
  Bump the known_good_semver group across 1 directory with 3 updates (#5315)
  `polkadot-node-core-pvf-common`: Fix test compilation error (#5310)
  ci: Paused `cmd-action` commenter (#5287)
  Remove unnecessary mut (#5318)
  chain-spec: minor clarification on the genesis config patch (#5324)
  fix av-distribution Jaeger spans mem leak (#5321)
  ...
* master:
  Remove redundant minimal template workspace (#5330)
  approval-distribution: Fix handling of conclude (#5375)
  More logs in `is_potential_spam` from `dispute-coordinator` (#5252)
  Fix zombienet bridges test (#5373)
  Update Readme of the `polkadot` crate (#5326)
  allow for `u8` to be used as hold/freeze reason (#5348)
  Moving cargo check for runtimes to GHA (#5340)
  Update links in the documentation (#5175)
  fix visibility for `pallet_nfts` types used as call arguments (#3634)
  Correct some typos in crates' descriptions (#5262)
  Aura: Ensure we are building on each relay chain fork (#5352)
  Update Identity pallet README.md (#5183)
  Bump trie-db from 0.29.0 to 0.29.1 (#5231)
  [Coretime] Always include UnpaidExecution, not just when revenue is > 0 (#5369)
  [Pools] fix derivation of pool account (#4999)
  Upgrade accidentally downgraded deps (#5365)
  [Pools] Fix issues with member migration to `DelegateStake` (#4822)
  Unify `no_genesis` check (#5360)
  [CI] Fix prdoc command (#5358)
  Beefy: add benchmarks for `report_fork_voting()` (#5188)
Copy link
Contributor

@tdimitrov tdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a short term fix this is good. @alexggh do have a point about the 1000 validators but it can be a follow up or part of the proper fix.

@ordian
Copy link
Member Author

ordian commented Aug 22, 2024

bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=westend
bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=rococo
bot clean

@command-bot
Copy link

command-bot bot commented Aug 22, 2024

@ordian Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent has finished. Result:

HttpError: Not Found
HttpError: Not Found
    at /app/node_modules/@octokit/request/dist-node/index.js:86:21
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async sendRequestWithRetries (/app/node_modules/octokit-auth-probot/node_modules/@octokit/auth-app/dist-node/index.js:466:12)
    at async Job.doExecute (/app/node_modules/bottleneck/light.js:405:18)

command-bot added 2 commits August 22, 2024 22:32
…=westend --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent
@command-bot
Copy link

command-bot bot commented Aug 23, 2024

@ordian Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7109801 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7109801/artifacts/download.

Copy link
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically, how much is the weight consumed by paras inherent in Polkadot and Kusama RC now, vs. how much will it be when this is deployed? (and which one is closer to reality -- although this is hard to measure 🙈)

@ordian ordian added this pull request to the merge queue Aug 29, 2024
@ordian
Copy link
Member Author

ordian commented Aug 29, 2024

Empirically, how much is the weight consumed by paras inherent in Polkadot and Kusama RC now, vs. how much will it be when this is deployed? (and which one is closer to reality -- although this is hard to measure 🙈)

I've provided some back-of-the-envelope calculations in this comment: #849 (comment), but TL;DR is that the dominant cost are from enter_bitfields (which we multiply by the number of para validators, which is 500 on Kusama at the moment and will grow to 1k). This number was already reduced in polkadot-fellows/runtimes#433 by 55% and the reasoning is explained in background section of #5270. This PR will further reduce it cost by more than a half (need to rerun the benches on Kusama, but you can see the cost for westend for comparison). Please note, that the cost of enter_bitfields by itself will remain the same, but subtract dynamically the cost of enter_empty. These and other annoyances will be addressed with a long-term refactoring mentioned in the first comment.
However, #5270 will bring some (a lot) of the costs back by properly accounting for e.g. processed messages in enacted candidates.

Merged via the queue into master with commit cc7ebe0 Aug 29, 2024
189 of 191 checks passed
@ordian ordian deleted the ao-fix-parainclusion-weight-overestimation branch August 29, 2024 09:16
ordian added a commit that referenced this pull request Aug 29, 2024
* master: (39 commits)
  short-term fix for para inherent weight overestimation (#5082)
  CI: Add backporting bot (#4795)
  Fix benchmark failures when using `insecure_zero_ed` flag (#5354)
  Command bot GHA v2 - /cmd <cmd> (#5457)
  Remove pallet::getter usage from treasury (#4962)
  Bump blake2b_simd from 1.0.1 to 1.0.2 (#5404)
  Bump rustversion from 1.0.14 to 1.0.17 (#5405)
  Bridge zombienet tests: remove old command (#5434)
  polkadot-parachain: Add omni-node variant with u64 block number (#5269)
  Refactor verbose test (#5506)
  Use umbrella crate for minimal template (#5155)
  IBP Coretime Polkadot bootnodes (#5499)
  rpc server: listen to `ipv6 socket` if available and `--experimental-rpc-endpoint` CLI option (#4792)
  Update approval-voting-regression-bench (#5504)
  change try-runtime rpc domains (#5443)
  polkadot-parachain-bin: Remove contracts parachain (#5471)
  Add feature to allow Aura collator to use full PoV size (#5393)
  Adding stkd bootnodes (#5470)
  Make `PendingConfigs` storage item public (#5467)
  frame-omni-bencher maintenance (#5466)
  ...
github-merge-queue bot pushed a commit that referenced this pull request Aug 29, 2024
On top of #5082.

## Background

Previously, before #3479, we would
[include](https://github.com/paritytech/polkadot-sdk/blame/75074952a859f90213ea25257b71ec2189dbcfc1/polkadot/runtime/parachains/src/builder.rs#L508C12-L508C44)
the cost enacting the candidate into the cost of processing a single
bitfield.
[Now](https://github.com/paritytech/polkadot-sdk/blame/dd48544a573dd02da2082cec1dda7ce735e2e719/polkadot/runtime/parachains/src/builder.rs#L529)
it is different, although the benchmarks seems to be not-up-to date.
Including the cost of enacting a candidate into a processing a single
bitfield cost was incorrect, since we multiple that by the number of
bitfields we have. Instead, we should separate calculate the cost of
processing a single bitfield without enactment, and multiple the cost of
enactment by the actual number of processed candidates (which is limited
by the number cores, not validators).

## Bench

Previously, the weight of `enact_candidate` was calculated manually
(without a benchmark) and then neglected:
https://github.com/paritytech/polkadot-sdk/blob/dd48544a573dd02da2082cec1dda7ce735e2e719/polkadot/runtime/parachains/src/inclusion/mod.rs#L584

In this PR, we have a benchmark for it and it's based on the number of
ump and sent hrmp messages as well as whether the candidate has a
runtime upgrade (new_validation_code).
The differences from the previous attempt
paritytech/polkadot#6929 are that
* we don't include the cost of enactment into the cost of processing a
backed candidate.
The reason for it is that enactment happens not in the same block as
backing (typically the next one), since we process bitfields before
backing votes.
* we don't take into account the size of the runtime upgrade, the
benchmark weight doesn't seem to depend much on it, but rather whether
there was one or not.

Similarly to the previous attempt, we don't account for dmp messages
(fixed cost). Also we don't account properly for received hrmp messages
(hrmp_watermark) because the cost of it depends on the runtime state and
can't be statically deduced in the benchmark (unless we pass the
information about channels as benchmark u32 arguments).

The total weight cost of processing a parainherent now includes the cost
of enactment of each candidate, but we don't do filtering based on that
(because we enact after processing bitfields and making other changes to
the storage).

## Numbers

```
Reads = 7 + (0 * u) + (3 * h) + (8 * c)
Writes = 10 + (1 * u) + (3 * h) + (7 * c)
```
In addition, there is a fixed cost of a few of ms (!) per candidate. 

This might result a full block slightly overflowing its weight with 200
enacted candidates, which in turn could prevent non-mandatory
transactions from being included in a block.

Given our modest limits on max ump and hrmp messages:
```
  maxUpwardMessageNumPerCandidate: 16
  hrmpMaxMessageNumPerCandidate: 10
```
and the fact that runtime upgrades are can't happen very frequently
(`validation_upgrade_cooldown`), we might only go over the limits in
case of many disputes.

TODOs:
- [x] Fix the overweight test
- [x] Generate the weights for Westend and Rococo
- [x] PRDoc

---------

Co-authored-by: command-bot <>
Co-authored-by: Alin Dima <alin@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T2-pallets This PR/Issue is related to a particular pallet. T8-polkadot This PR/Issue is related to/affects the Polkadot network. T12-benchmarks This PR/Issue is related to benchmarking and weights.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

paraInclusion is massively overestimating its weight
7 participants