-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition between 0-RTT and Incoming #1821
Conversation
cb6bab4
to
f6a8c5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, thanks! This was very likely to be an issue in practice because 0-RTT packets are likely to sent, and hence received, immediately after the first Initial, and might hence routinely be processed by the Endpoint
before the application has even had a chance to see the Incoming
.
I think the strategy here is on the right track, but we should consider carefully what the default buffering limit should be. While it's true that the default receive_window
is unlimited, the default stream receive window is not, and neither is the default stream concurrency limit, so there are effective limits by default, and we should preserve that pattern here.
Indeed, I discovered this not through thinking about the code really hard but through empirical testing. I was a bit embarrassed when I realized that the problem was caused by me, I initially thought it was pre-existing.
Good catch. And solving this is kind of tricky. But here's an approach I've implemented now that I think works, let me know what you think: Firstly, we can now move the However, that is not sufficient to prevent memory exhaustion via filling many incoming with buffered early data, because these limits multiply to too high of a number. With the default transport config: (100 max concurrent bidi streams + 100 max concurrent uni streams) × 1.25 MB stream receive window = 250 MB × 2^16 max incoming = 1.64 TB Which I'm pretty sure is a lot of RAM. So to avoid that, we implement another So to summarize:
|
f5fd635
to
bfc9b75
Compare
bfc9b75
to
89f208d
Compare
I'm not sure how to debug the freebsd CI failure, I don't see logs for it |
That's weird, normally it logs just fine. GitHub infra flake, perhaps? I've restarted it manually. |
Thanks. It passed this time, so I guess it was just a spurious CI failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expose the new limits on ServerConfig
? 100MiB makes sense for a lot of applications, but someone might want to use Quinn in a memory-constrained environment (e.g. a router).
cc4bd2e
to
0f19a07
Compare
Good idea. Now that this is all happening in proto, there's no friction to us doing that. Added three new settings to
I made it so the default
I believe these should all have been fixed in this change. |
0f19a07
to
d0e8175
Compare
d0e8175
to
ce62a7a
Compare
676c523
to
3dcc72a
Compare
Thanks for the feedback. My last round of changes was a bit sloppier than I would've liked it to be, especially with me missing the
|
3dcc72a
to
eb372bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minorish feedback and a more existential question.
This all seems like substantial complexity. How do we feel about alternative solutions where we force callers to address one Incoming
at a time? Do we really need the flexibility of keeping multiple incoming packets in flight at the same time? It feels like a larger change that is sort of flying in here under the radar as a very flexible solution to a problem (the race condition identified in #1820), with the flexibility also causing an increase in fragility.
(The answer might well be yes, but wanted to make this more explicit anyway.)
Minor nit: with the frequent force-pushes, keeping an issue reference in the commit message is a bit of a pain because it adds a whole bunch of backreferences in the issue. Consider leaving the issue reference out of the commit message in the future, in favor of keeping it in the PR description.
It's a worthwhile question. Merely limiting there to being only one To avoid having to buffer early datagrams not associated with a connection handle, we would need to be able make a decision whether to accept/refuse/retry/ignore an incoming connection attempt immediately and synchronously when the endpoint receives the connection-creating packet and before it tries to process any further packets (as the decision affects which subsequently received packets get routed to a connection and which get discarded). This could be achieved by removing the One example of a situation where allowing multiple while let Some(incoming) = endpoint.accept().await {
let ip = incoming.remote_address().ip();
if !block_list_bloom_filter.maybe_contains(ip) {
task::spawn(async move {
handle_incoming(incoming.accept().expect("TODO put real error handling here")).await;
});
} else {
task::spawn(async move {
if !block_list_database.contains(ip).await {
handle_incoming(incoming.accept().expect("TODO put real error handling here")).await;
}
});
}
} This would be a situation where allowing multiple |
c63dc5d
to
03294a8
Compare
I was thinking we might have Let's see what @Ralith thinks. |
This would complicate the async layer considerably. We would need to move the More importantly, it wouldn't solve anything: between GRO and recvmmsg, we may receive many datagrams instantaneously. One batch might include both the first datagram for an incoming connection and follow-up initial or 0-RTT packets for that same connection. These must be either buffered or (as in the status quo) dropped, and it's most convenient to do so at the proto layer, where we at least have the option to correlate them, discard non-QUIC datagrams, and respond directly for stateless cases. Finally, if we could suspend receipt of datagrams immediately after receiving a connection's first datagram, that would be at odds with future work to parallelize datagram receipt and other endpoint driver work, which is our major remaining milestone for intra-endpoint scaling. In sum, the current form of this PR is strongly motivated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation LGTM. Remaining items:
- Some final nits on the docs
Decide whether to discard per-Incoming
buffer limits for simplicity- Test coverage
fc23130
to
eba944e
Compare
Comments tweaked. Tests added. As suggested, new tests are based on |
Closes quinn-rs#1820 The fix: - Endpoint now maintains a slab with an entry for each pending Incoming to buffer received data. - ConnectionIndex now maps initial DCID to that slab key immediately upon construction of Incoming. - If Incoming is accepted, association is overridden with association with ConnectionHandle, and all buffered datagrams are fed to newly constructed Connection. - If Incoming is refused/retried/ignored, or accepting errors, association and slab entry are cleaned up to prevent memory leak. Additional considerations: - The Incoming::ignore operation can no longer be implemented as just dropping it. To help prevent incorrect API usage, proto::Incoming is modified to log a warning if it is dropped without being passed to Endpoint::accept/refuse/retry/ignore. - Three things protect against memory exhaustion attacks here: 1. The MAX_INCOMING_CONNECTIONS limit is moved from quinn to proto, limiting the number of concurrent incoming connections for which datagrams will be buffered before the application decides what to do with them. Also, it is changed from a constant to a field of the server config, max_incoming. 2. Per-incoming buffered data is limited to a new limit stored in the server config, incoming_buffer_size, beyond which subsequent packets are discarded if received in these conditions. 3. The sum total of all incoming buffered data is limited to a new limit stored in the server config, incoming_buffer_size_total, beyond which subsequent packets are discarded if received in these conditions.
eba944e
to
b76f7a7
Compare
Thanks for all the effort here! |
Closes #1820
The fix:
Additional considerations:
The Incoming::ignore operation can no longer be implemented as just dropping it. To help prevent incorrect API usage, proto::Incoming is modified to log a warning if it is dropped without being passed to Endpoint::accept/refuse/retry/ignore.
To help protect against memory exhaustion attacks, per-Incoming buffered data is limited to twice the receive window or 10 KB, which- ever is larger. Excessive packets silently dropped.
Does this introduce a new vulnerability to an attack in which an attacker could spam a server with 0-RTT packets with the same connection ID as it observed a client attempting to initiate a 0-RTT connection to the server? I do think so.
Is this a severe problem? Here's two reasons I don't think so:
Could this be avoided? Possibly by introducing additional state to the buffering state to validate whether these packets are validly encrypted for the associated connection? However, that may risk making these operations costly enough that they start to defeat the DDOS-resistance abilities of the Incoming API.