Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move PVF code and PoV decompression to PVF host workers #5142

Merged
merged 16 commits into from
Aug 9, 2024

Conversation

s0me0ne-unkn0wn
Copy link
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn commented Jul 25, 2024

Closes #5071

This PR aims to

  • Move all the blocking decompression from the candidate validation subsystem to the PVF host workers;
  • Run the candidate validation subsystem on the non-blocking pool again.

Upsides: no blocking operations in the subsystem's main loop. PVF throughput is not limited by the ability of the subsystem to decompress a lot of stuff. Correctness and homogeneity improve, as the artifact used to be identified by the hash of decompressed code, and now they are identified by the hash of compressed code, which coincides with the on-chain ValidationCodeHash.

Downsides: the PVF code decompression is now accounted for in the PVF preparation timeout (be it pre-checking or actual preparation). Taking into account that the decompression duration is on the order of milliseconds, and the preparation timeout is on the order of seconds, I believe it is negligible.

@s0me0ne-unkn0wn
Copy link
Contributor Author

@sandreim Before this one gets even fatter, I have to ask. I can make it simpler by moving some metrics (namely, PVF size and compressed/uncompressed PoV size) from candidate validation subsys to the PVF host. This will most probably break some Grafana dashboards, and I'm not sure what else. Is it worth the code simplification?

@sandreim
Copy link
Contributor

@sandreim Before this one gets even fatter, I have to ask. I can make it simpler by moving some metrics (namely, PVF size and compressed/uncompressed PoV size) from candidate validation subsys to the PVF host. This will most probably break some Grafana dashboards, and I'm not sure what else. Is it worth the code simplification?

Yes, moving the code dealing with the blobs to pvf host simplifies things in candidate validation. We could keep the old metric names so nothing is broken.

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6885379

@s0me0ne-unkn0wn s0me0ne-unkn0wn added the T0-node This PR/Issue is related to the topic “node”. label Aug 2, 2024
@s0me0ne-unkn0wn s0me0ne-unkn0wn marked this pull request as ready for review August 2, 2024 08:50
@s0me0ne-unkn0wn s0me0ne-unkn0wn changed the title [WIP] Move PVF code and PoV decompression to PVF host workers Move PVF code and PoV decompression to PVF host workers Aug 2, 2024
Copy link
Contributor

@AndreiEres AndreiEres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that there is an issue with passing the same data through so many functions. However, this concern is not related to the current PR, as it simply follows the chosen approach.

polkadot/node/core/pvf/src/host.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @s0me0ne-unkn0wn !

polkadot/node/core/pvf/src/metrics.rs Show resolved Hide resolved
prdoc/pr_5142.prdoc Outdated Show resolved Hide resolved
- name: polkadot-overseer
bump: patch
- name: polkadot-node-core-pvf
bump: major
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could have done better to just require a minor bump if we added a new API that takes in the uncompressed blob along with some types for compressed and uncompressed blobs. But since these crates are just used internally by the polkadot node I think it doesn't worth investing in.

@s0me0ne-unkn0wn s0me0ne-unkn0wn added this pull request to the merge queue Aug 9, 2024
Merged via the queue into master with commit 47c1b4c Aug 9, 2024
166 of 172 checks passed
@s0me0ne-unkn0wn s0me0ne-unkn0wn deleted the s0me0ne/decompress-in-workers branch August 9, 2024 16:29
dharjeezy pushed a commit to dharjeezy/polkadot-sdk that referenced this pull request Aug 28, 2024
)

Closes paritytech#5071 

This PR aims to
* Move all the blocking decompression from the candidate validation
subsystem to the PVF host workers;
* Run the candidate validation subsystem on the non-blocking pool again.

Upsides: no blocking operations in the subsystem's main loop. PVF
throughput is not limited by the ability of the subsystem to decompress
a lot of stuff. Correctness and homogeneity improve, as the artifact
used to be identified by the hash of decompressed code, and now they are
identified by the hash of compressed code, which coincides with the
on-chain `ValidationCodeHash`.

Downsides: the PVF code decompression is now accounted for in the PVF
preparation timeout (be it pre-checking or actual preparation). Taking
into account that the decompression duration is on the order of
milliseconds, and the preparation timeout is on the order of seconds, I
believe it is negligible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
5 participants