-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Support alternative future executor handling for compute-bounded futures #94
Comments
I'll just note that |
Being able to configure this from a higher level crate (e.g. Would the static need to be re-exported from the higher level crate, or can I just add |
You should be able to just add |
@arielb1 currently in the sample code, it accepts async and blocks on it inside of
My understanding is that you would need an explicit dependency on the experimental crate to customize the behavior of that global constant if eg tokio-rustls doesn't re-export. That said, we might as well re-export for convenience. |
Yeah that's what I was worried about. If versions are the same it should be the same compiled lib and therefore the same static, but wasn't sure if that would hold with different versions or different features etc.
If |
The API of block_in_place is sync, so you mean it adds another async layer on top of it? |
I tried to sketch this out in the issue via sample code, please pick it apart if not feasible:
|
The rule with Cargo is that every major version can only exist once in a given binary (e.g., you can have separate 0.4.0, 0.5.0, 1.0.0 and 2.0.0 versions of a crate, but not 2.0.0 and 2.1.0). Features don't matter.
I don't see why we would be using |
I was trying to avoid requiring packages like tokio-rustls to explicitly initialize in case the caller didn't set custom behavior, but yeah I can just do that too. |
I spoke to @arielb1 out of band and updated the description of the issue to contain sample code that actually compiles, along with some tweaks based on Ariel's feedback: Object safetyNo more trait object containing generic methods, which can't be sized. Instead we now have a more opinionated set of handling for With that change, I didn't actually see much value in exposing a trait at all. Instead we just call a different function per Would welcome feedback on this approach. Certainly I could do something similar via trait - do people prefer that to closures? OnceLock handlingAriel pointed out that it could be very confusing to callers if they have a code path that executes cpu-heavy code prior to initializing the OnceLock containing the executor config, and it is implicitly initialized by this library. Instead, since we aren't allocating anything to construct |
Well that solves my Using (I'm not sure I can think of a realistic scenario where that would actually happen, other than the caller accidentally initialising the
Yeah so it could happen if, for example, I've not updated my I guess it's the sort of issue that can be offset with some warning in the docs reminding people to make sure they're using the same major versions if they're interacting with it directly rather than through an re-export from |
Maybe we should have a rustls-heavy-futures-executor crate with 1 major version that rustls depends on?On 6 Dec 2024, at 12:49, jclmnop ***@***.***> wrote:
Instead, since we aren't allocating anything to construct CpuHeavyFutureExecutor::BlockInPlace or CpuHeavyFutureExecutor::SpawnBlocking, and runtime_flavor() is quite cheap, we can just call OnceCell::get() and, if None, decide which executor to use each time spawn_compute_heavy_future() is called.
Well that solves my .get_or_init() concerns.
Using .get().unwrap_or() also means there won't be any conflicts between the caller using .set() and the library using .get_or_init() if the caller decides to set it later on for some reason.
(I'm not sure I can think of a realistic scenario where that would actually happen, other than the caller accidentally initialising the OnceCell later than they should)
The rule with Cargo is that every major version can only exist once in a given binary (e.g., you can have separate 0.4.0, 0.5.0, 1.0.0 and 2.0.0 versions of a crate, but not 2.0.0 and 2.1.0). Features don't matter.
Yeah so it could happen if, for example, I've not updated my reqwest dependency and it's still using v0.25.x of tokio-rustls, then I have v0.26.x declared separately in my Cargo.toml with this feature to set the OnceLock and I'm sat there wondering why it isn't using my custom strategy in reqwest.
I guess it's the sort of issue that can be offset with some warning in the docs reminding people to make sure they're using the same major versions if they're interacting with it directly rather than through an re-export from reqwest though, so not really a big deal.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
How does that solve this problem, since anyway they will need the version of tokio-rustls that uses this crate in reqwest's dependency tree? I do see some benefit to this, namely that we could then offer the same approach to s2n-tls-tokio and others. I just touched in with @djc on discord, and he was fine with depending on such a crate provided it wasn't compiled unless the experimental config was enabled. In any case, I didn't anticipate reqwest directly interacting with this config or re-exporting anything. The re-export I was referring to was from tokio-rustls, in the event we do depend on an experimental crate. It just strikes me as an explosion of tendrils to expect any calling library that uses tokio-rustls to add explicit references to this feature. Particularly if it is intended more of a POC to try to get this moved directly into the tokio runtime config. Open to alternatives, that is just my initial thinking. |
Spoke with @jswrenn and he pointed out that if we're willing to do a little black magic with Any, potentially we could keep generic bounds out of the Will poke around with it a bit. |
Just to capture my thoughts - I also think I want to add the API to offer a 'usually blocking' versus 'sometimes/briefly blocking' config. Since, callers might want to send the former to a threadpool and/or spawn_blocking, but not the latter. Context here is that @ctz mentioned that the initial tls handshake is much more expensive than subsequent crypto operations once the connection is open. So for most users, they probably care about the cpu-boundedness of only the initial handshake. But for very high-throughput applications with a lot of i/o tasks, perhaps they still want to handle ANY blocking crypto specially. And then, the default for 'sometimes/briefly' blocking would just be to await the future in the current worker. I would add |
I am willing to contribute work on this. Sharing for broader feedback after a conversation with djc@ and ctz@. There is also a thread in the rustls discord: https://discord.com/channels/976380008299917365/1313647498061025411
Overview
Today, this crate wraps a heterogenous set of futures, some of which are I/O bounded (eg, network hops), some of which are compute bounded (eg, crypto operations). In the case of initial handshake, those compute operations can impact the performance of the tokio executor by starving worker threads.
I've encountered some pathological cases of this where applications are needing to establish many connections very quickly, and the executor grinds to a halt. It also highly impacts
current-thread
runtime. The impact on multi-threaded runtimes without pathological amounts of connections is lessened, but still non-zero.Meanwhile, approaches like delegating the tls handshake to a secondary threadpool with lower priority threads, dramatically improves throughput, both for the tls handshake work as well as other tasks.
Given that we know the composition of the work inside these futures, it would be great to enable more sophisticated handling of the futures that might block the executor.
Task::spawn_blocking
as it stands today isn't quite right as is can spawn many more threads than cores, and also is fairly opinionated in that can force expensive thread-local initializations and cause other issues.Task::block_in_place
has its own limitations.Ultimately I think the goal state here would be:
tokio-rustls
is able to signal to the tokio executor that these futures are likely to block / be long running (at least for initial handshake)tokio-rustls
->hyper-rustls
->hyper-util
->reqwest
There is a conversation in progress about this sort of possibility in the #tokio-internals channel of the tokio discord.
As that conversation progresses, I think that there is an intermediate step available here under an experimental feature flag, that would be a good POC for such functionality:
spawn_compute_heavy_future
, and applies special handling to ittokio-rustls
that checks an a cfg flag to decide whether to delegate to the experimental crate, otherwise it just awaits the provided futureWe could also embed this directly into tokio-rustls rather than an experimental crate, depending on maintainer preference.
Suggested approach
A couple of guidelines:
We'll have a lot more options if support for this lands in tokio, since then you can configure via runtime config, do clever things with unwinding the poll fn call for 'sometimes blocking' futures that might prefer thread locality, pre-emptively kick tasks to other worker queues, overloading
spawn_blocking
, etc.In the meantime I suggest that we just use a public static constant
OnceLock
which can be initialized with a strategy forspawn_compute_heavy_future
. This allows the outermost caller to inject config, without propagating it through intermediate crates, which better simulates what it would be like to embed it in tokio. Or we could fall back to defaults.If the caller doesn't specify anything, we can implicitly set it to one of two options:
block_in_place
to call received futuresspawn_blocking
I think that the above is a decent middle ground for 'somewhat better' than the current behavior for the caller establishing occasional connections, and it's behind an experimental flag so we could document this opinionated behavior and guide users to evaluate it and consider overriding it. We'll hopefully come up with some cooler things to do natively on tokio, before we ever lift this out of the experimental flag.
Alternately, the caller CAN specify arbitrary behavior, such as injecting their own secondary executor backed by a threadpool. We could provide a sample usage showing a simple version of that, but I don't think it belongs as a default in the crate since anyway calling applications using threadpools probably have other logic they want to incorporate.
Rough example:
Experimental crate contents:
And then a sample usage with a custom closure backed by a threadpool:
Questions
The text was updated successfully, but these errors were encountered: