-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF worker: look into removing unneeded dependencies #649
Comments
Auditing the dependencies, it looks like we can almost get rid of Since And/or we can have simple text logs specific to the workers. cc @sandreim Footnotes
|
1. We were using `rayon` to spawn a superfluous thread to do execution, so it was removed. 2. We were using `rayon` to set a threadpool-specific thread stack size, and AFAIK we couldn't do that with `tokio` (it's possible [per-runtime](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.thread_stack_size) but not per-thread). Since we want to remove `tokio` from the workers [anyway](https://github.com/paritytech/polkadot/issues/7117), I changed it to spawn threads with the `std::thread` API instead of `tokio`.[^1] [^1]: NOTE: This PR does not totally remove the `tokio` dependency just yet. 3. Since `std::thread` API is not async, we could no longer `select!` on the threads as futures, so the `select!` was changed to a naive loop. 4. The order of thread selection was flipped to make (3) sound (see note in code). I left some TODO's related to panics which I'm going to address soon as part of #7045.
How much of a win is getting rid of |
Don't have exact numbers, but
If we send logs over a socket, then the cost is just another socket I suppose. Alternatively we could reuse the same socket and use a prefix to detect if the message from the worker is a log or a response. |
tracing-gum depends on jaeger, and it's quite heavy https://github.com/paritytech/polkadot/blob/master/node/jaeger/Cargo.toml |
The dependency on jaeger is just one function, that could just be copied. |
* PVF: Remove `rayon` and some uses of `tokio` 1. We were using `rayon` to spawn a superfluous thread to do execution, so it was removed. 2. We were using `rayon` to set a threadpool-specific thread stack size, and AFAIK we couldn't do that with `tokio` (it's possible [per-runtime](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.thread_stack_size) but not per-thread). Since we want to remove `tokio` from the workers [anyway](https://github.com/paritytech/polkadot/issues/7117), I changed it to spawn threads with the `std::thread` API instead of `tokio`.[^1] [^1]: NOTE: This PR does not totally remove the `tokio` dependency just yet. 3. Since `std::thread` API is not async, we could no longer `select!` on the threads as futures, so the `select!` was changed to a naive loop. 4. The order of thread selection was flipped to make (3) sound (see note in code). I left some TODO's related to panics which I'm going to address soon as part of #7045. * PVF: Vote invalid on panics in execution thread (after a retry) Also make sure we kill the worker process on panic errors and internal errors to potentially clear any error states independent of the candidate. * Address a couple of TODOs Addresses a couple of follow-up TODOs from #7153. * Add some documentation to implementer's guide * Fix compile error * Fix compile errors * Fix compile error * Update roadmap/implementers-guide/src/node/utility/candidate-validation.md Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com> * Address comments + couple other changes (see message) - Measure the CPU time in the prepare thread, so the observed time is not affected by any delays in joining on the thread. - Measure the full CPU time in the execute thread. * Implement proper thread synchronization Use condvars i.e. `Arc::new((Mutex::new(true), Condvar::new()))` as per the std docs. Considered also using a condvar to signal the CPU thread to end, in place of an mpsc channel. This was not done because `Condvar::wait_timeout_while` is documented as being imprecise, and `mpsc::Receiver::recv_timeout` is not documented as such. Also, we would need a separate condvar, to avoid this case: the worker thread finishes its job, notifies the condvar, the CPU thread returns first, and we join on it and not the worker thread. So it was simpler to leave this part as is. * Catch panics in threads so we always notify condvar * Use `WaitOutcome` enum instead of bool condition variable * Fix retry timeouts to depend on exec timeout kind * Address review comments * Make the API for condvars in workers nicer * Add a doc * Use condvar for memory stats thread * Small refactor * Enumerate internal validation errors in an enum * Fix comment * Add a log * Fix test * Update variant naming * Address a missed TODO --------- Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Starting the tokio runtime was calling `socketpair` and triggering the new seccomp filter. Removed tokio since we wanted to do it soon anyway as part of #649.
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
* do not refetch completion data instantly * clippy
ISSUE
Overview
As part of #882, we want to block unneeded syscalls in the PVF workers. After #650 we will have separate worker binaries, whose object files we can analyze for included syscalls. However, right now a lot of unnecessary syscalls are found present in the object files.
This issue is about removing some big dependencies1 and seeing if it has an impact on syscalls or binary size, or if Rust is smart enough that it's already optimizing as much as it can. (Will make sure to try this in combination with enabling LTO.)
Footnotes
i.e. we can remove the dependency on
polkadot-node-core-pvf
by moving that into e.g.host
and common objects intocommon
, so we have the following structure:pvf/host
,pvf/worker
,pvf/common
. Then see how that affects binary size. ↩The text was updated successfully, but these errors were encountered: