Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition between 0-RTT and Incoming #1821

Merged
merged 2 commits into from
May 1, 2024

Conversation

gretchenfrage
Copy link
Collaborator

Closes #1820

The fix:

  • Endpoint now maintains a slab with an entry for each pending Incoming to buffer received data.
  • ConnectionIndex now maps initial DCID to that slab key immediately upon construction of Incoming.
  • If Incoming is accepted, association is overridden with association with ConnectionHandle, and all buffered datagrams are fed to newly constructed Connection.
  • If Incoming is refused/retried/ignored, or accepting errors, association and slab entry are cleaned up to prevent memory leak.

Additional considerations:

  • The Incoming::ignore operation can no longer be implemented as just dropping it. To help prevent incorrect API usage, proto::Incoming is modified to log a warning if it is dropped without being passed to Endpoint::accept/refuse/retry/ignore.

  • To help protect against memory exhaustion attacks, per-Incoming buffered data is limited to twice the receive window or 10 KB, which- ever is larger. Excessive packets silently dropped.

    • Does this introduce a new vulnerability to an attack in which an attacker could spam a server with 0-RTT packets with the same connection ID as it observed a client attempting to initiate a 0-RTT connection to the server? I do think so.

      Is this a severe problem? Here's two reasons I don't think so:

      1. The default receive window is set to max value, so this won't actually kick in unless the user is already hardening against adverse conditions. 2. It is already possible for an on-path attacker to distrupt a connection handshake if 0.5-RTT data is being used, so this probably doesn't actually expand the set of situations in which it's vulnerable to this kind of vulnerability.

      Could this be avoided? Possibly by introducing additional state to the buffering state to validate whether these packets are validly encrypted for the associated connection? However, that may risk making these operations costly enough that they start to defeat the DDOS-resistance abilities of the Incoming API.

@gretchenfrage gretchenfrage marked this pull request as ready for review April 14, 2024 21:31
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thanks! This was very likely to be an issue in practice because 0-RTT packets are likely to sent, and hence received, immediately after the first Initial, and might hence routinely be processed by the Endpoint before the application has even had a chance to see the Incoming.

I think the strategy here is on the right track, but we should consider carefully what the default buffering limit should be. While it's true that the default receive_window is unlimited, the default stream receive window is not, and neither is the default stream concurrency limit, so there are effective limits by default, and we should preserve that pattern here.

quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
@gretchenfrage
Copy link
Collaborator Author

This was very likely to be an issue in practice because 0-RTT packets are likely to sent, and hence received, immediately after the first Initial, and might hence routinely be processed by the Endpoint before the application has even had a chance to see the Incoming.

Indeed, I discovered this not through thinking about the code really hard but through empirical testing. I was a bit embarrassed when I realized that the problem was caused by me, I initially thought it was pre-existing.

fd22d144b8b65d9b


How do we limit the number of these? Buffer size limits will not be an effective defense if the number of buffers is unbounded.

Good catch. And solving this is kind of tricky. But here's an approach I've implemented now that I think works, let me know what you think:

Firstly, we can now move the MAX_INCOMING_CONNECTIONS check from quinn to proto. (Also, rename it to just MAX_INCOMING, which we sort of forgot to do in the original PR).

However, that is not sufficient to prevent memory exhaustion via filling many incoming with buffered early data, because these limits multiply to too high of a number. With the default transport config:

(100 max concurrent bidi streams + 100 max concurrent uni streams) × 1.25 MB stream receive window = 250 MB

× 2^16 max incoming = 1.64 TB

Which I'm pretty sure is a lot of RAM. So to avoid that, we implement another MAX_ALL_INCOMING_BUFFERED limit set to 100 MiB which is a limit to the total number of bytes buffered for incomings across all incoming within the connection. If either the per-incoming limit or the MAX_ALL_INCOMING_BUFFERED limit are exceeded, the packet is dropped.

So to summarize:

  • Upon receipt of an incoming-creating packet, if the number of pending incoming exceeds MAX_INCOMING = 1^16, endpoint automatically responds with refuse packet.
  • Upon receipt of an incoming-associated packet, if the number of buffered packet bytes for that incoming exceeds the per-incoming limit calculated as min(receive_window, (max_concurrent_bidi_streams + max_concurrent_uni_streams) * stream_receive_window), endpoint drops the packet.
  • Upon receipt of an incoming-associated packet, if the number of buffered packet bytes for all incoming collectively exceeds MAX_ALL_INCOMING_BUFFERED = 100 MiB, endpoint drops the packet.

@gretchenfrage
Copy link
Collaborator Author

I'm not sure how to debug the freebsd CI failure, I don't see logs for it

@Ralith
Copy link
Collaborator

Ralith commented Apr 16, 2024

That's weird, normally it logs just fine. GitHub infra flake, perhaps? I've restarted it manually.

@gretchenfrage
Copy link
Collaborator Author

Thanks. It passed this time, so I guess it was just a spurious CI failure.

@gretchenfrage gretchenfrage mentioned this pull request Apr 17, 2024
7 tasks
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expose the new limits on ServerConfig? 100MiB makes sense for a lot of applications, but someone might want to use Quinn in a memory-constrained environment (e.g. a router).

quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
@gretchenfrage gretchenfrage force-pushed the 1661-bug-fix branch 4 times, most recently from cc4bd2e to 0f19a07 Compare April 20, 2024 21:52
@gretchenfrage
Copy link
Collaborator Author

Should we expose the new limits on ServerConfig? 100MiB makes sense for a lot of applications, but someone might want to use Quinn in a memory-constrained environment (e.g. a router).

Good idea. Now that this is all happening in proto, there's no friction to us doing that. Added three new settings to ServerConfig: max_incoming, max_buffer_bytes_per_incoming, and max_buffer_bytes_all_incoming.

What if a connection is configured for use with application datagrams (or some other future messaging primitive) only, and stream limits are hence set to zero? Might be simplest to only use a dedicated limit.

I made it so the default ServerConfig constructor calculates max_buffer_bytes_per_incoming as was previously calculated there. Although I did switch it to just hard-code the overhead to 1200 bytes--let me know if you want me to not do it like that, but it seems more straightforward than the trick with comparison. Anyways, if a user configures the stream limits to zero, but still wants 0-RTT datagrams in excess of 1200 bytes to work more-so than they otherwise would, they can manually set max_buffer_bytes_per_incoming to some appropriately high number. Let me know if you think this is a pitfall sufficiently likely to be worth documenting.

It's confusing that this clause checks after adding the next datagram whereas the first clause checks before.

Rounding up to the next packet isn't necessarily about allowing for overhead (which might vary dramatically according to sender discretion), just erring on the side of being permissive.

If we use < here, then 0 has a more intuitive effect.

I believe these should all have been fixed in this change.

quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
@gretchenfrage gretchenfrage force-pushed the 1661-bug-fix branch 2 times, most recently from 676c523 to 3dcc72a Compare April 24, 2024 05:16
@gretchenfrage
Copy link
Collaborator Author

Thanks for the feedback. My last round of changes was a bit sloppier than I would've liked it to be, especially with me missing the += bug. I think the feedback should be addressed now though.

  • Reworked the limit checking logic to be based on checked_add in a way I think is a lot cleaner.
  • Largely rewrote the added config docs.
  • Made other more minor changes.

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minorish feedback and a more existential question.

This all seems like substantial complexity. How do we feel about alternative solutions where we force callers to address one Incoming at a time? Do we really need the flexibility of keeping multiple incoming packets in flight at the same time? It feels like a larger change that is sort of flying in here under the radar as a very flexible solution to a problem (the race condition identified in #1820), with the flexibility also causing an increase in fragility.

(The answer might well be yes, but wanted to make this more explicit anyway.)

Minor nit: with the frequent force-pushes, keeping an issue reference in the commit message is a bit of a pain because it adds a whole bunch of backreferences in the issue. Consider leaving the issue reference out of the commit message in the future, in favor of keeping it in the PR description.

quinn-proto/src/endpoint.rs Show resolved Hide resolved
quinn-proto/src/shared.rs Show resolved Hide resolved
quinn-proto/src/endpoint.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Show resolved Hide resolved
@gretchenfrage
Copy link
Collaborator Author

This all seems like substantial complexity. How do we feel about alternative solutions where we force callers to address one Incoming at a time? Do we really need the flexibility of keeping multiple incoming packets in flight at the same time? It feels like a larger change that is sort of flying in here under the radar as a very flexible solution to a problem (the race condition identified in #1820), with the flexibility also causing an increase in fragility.

It's a worthwhile question.

Merely limiting there to being only one Incoming at a time would not let us fully avoid buffering early datagrams not associated with a connection handle. It would allow us to indiscriminately put all early datagrams not associated with a connection handle into a single buffer rather than in per-Incoming buffers, but that doesn't seem like much of a simplification.

To avoid having to buffer early datagrams not associated with a connection handle, we would need to be able make a decision whether to accept/refuse/retry/ignore an incoming connection attempt immediately and synchronously when the endpoint receives the connection-creating packet and before it tries to process any further packets (as the decision affects which subsequently received packets get routed to a connection and which get discarded).

This could be achieved by removing the Incoming API as it exists now and instead just putting some sort of incoming_logic: Box<dyn FnMut(&IncomingConnectionInfo) -> IncomingConnectionDecision> field in the server config which the endpoint can call synchronously. I don't think we should do that.

One example of a situation where allowing multiple Incoming to be in motion at the same time would be useful is if there is an IP block list (or allow list) that's stored in a database but has an in-memory Bloom filter to accelerate it:

while let Some(incoming) = endpoint.accept().await {
    let ip = incoming.remote_address().ip();
    if !block_list_bloom_filter.maybe_contains(ip) {
        task::spawn(async move {
            handle_incoming(incoming.accept().expect("TODO put real error handling here")).await;
        });
    } else {
        task::spawn(async move {
            if !block_list_database.contains(ip).await {
                handle_incoming(incoming.accept().expect("TODO put real error handling here")).await;
            }
        });
    }
}

This would be a situation where allowing multiple Incoming in motion simultaneously, and even allowing the application to make decisions on them in a different order than they were produced, could improve performance and/or attack mitigation effectiveness.

@gretchenfrage gretchenfrage force-pushed the 1661-bug-fix branch 2 times, most recently from c63dc5d to 03294a8 Compare April 26, 2024 04:31
@djc
Copy link
Member

djc commented Apr 26, 2024

To avoid having to buffer early datagrams not associated with a connection handle, we would need to be able make a decision whether to accept/refuse/retry/ignore an incoming connection attempt immediately and synchronously when the endpoint receives the connection-creating packet and before it tries to process any further packets (as the decision affects which subsequently received packets get routed to a connection and which get discarded).

I was thinking we might have accept() take ownership of the Endpoint, storing the Endpoint in Incoming, and "giving it back" after accept/reject/ignore. This would still allow the caller to handle incoming connections asynchronously, but not in parallel (thus avoiding buffering issues). It's less flexible but simpler in the end.

Let's see what @Ralith thinks.

@Ralith
Copy link
Collaborator

Ralith commented Apr 26, 2024

have accept() take ownership of the Endpoint

This would complicate the async layer considerably. We would need to move the proto::Endpoint out of the quinn::endpoint::State each time a connection is incoming, then restore it and wake the endpoint driver after a decision is made. It's also difficult to allow such an accept to be used concurrently with any other endpoint access (e.g. connect, close, assorted getters). Because typical servers will always be waiting on accept, surfacing such a limitation to the public API is likely to degrade usability.

More importantly, it wouldn't solve anything: between GRO and recvmmsg, we may receive many datagrams instantaneously. One batch might include both the first datagram for an incoming connection and follow-up initial or 0-RTT packets for that same connection. These must be either buffered or (as in the status quo) dropped, and it's most convenient to do so at the proto layer, where we at least have the option to correlate them, discard non-QUIC datagrams, and respond directly for stateless cases.

Finally, if we could suspend receipt of datagrams immediately after receiving a connection's first datagram, that would be at odds with future work to parallelize datagram receipt and other endpoint driver work, which is our major remaining milestone for intra-endpoint scaling.

In sum, the current form of this PR is strongly motivated.

quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
quinn-proto/src/config.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM. Remaining items:

  • Some final nits on the docs
  • Decide whether to discard per-Incoming buffer limits for simplicity
  • Test coverage

@gretchenfrage gretchenfrage force-pushed the 1661-bug-fix branch 2 times, most recently from fc23130 to eba944e Compare April 30, 2024 23:18
@gretchenfrage
Copy link
Collaborator Author

Comments tweaked. Tests added. As suggested, new tests are based on zero_rtt_happypath test and validate the number of dropped packets. Added to TestEndpoint / IncomingConnectionBehavior the ability to have Incoming be pushed to a waiting_incoming vec to be dealt with later rather than dealt with immediately.

@gretchenfrage gretchenfrage requested a review from Ralith April 30, 2024 23:21
quinn-proto/src/tests/mod.rs Outdated Show resolved Hide resolved
Closes quinn-rs#1820

The fix:

- Endpoint now maintains a slab with an entry for each pending Incoming
  to buffer received data.
- ConnectionIndex now maps initial DCID to that slab key immediately
  upon construction of Incoming.
- If Incoming is accepted, association is overridden with association
  with ConnectionHandle, and all buffered datagrams are fed to newly
  constructed Connection.
- If Incoming is refused/retried/ignored, or accepting errors,
  association and slab entry are cleaned up to prevent memory leak.

Additional considerations:

- The Incoming::ignore operation can no longer be implemented as just
  dropping it. To help prevent incorrect API usage, proto::Incoming is
  modified to log a warning if it is dropped without being passed to
  Endpoint::accept/refuse/retry/ignore.
- Three things protect against memory exhaustion attacks here:

  1. The MAX_INCOMING_CONNECTIONS limit is moved from quinn to proto,
     limiting the number of concurrent incoming connections for which
     datagrams will be buffered before the application decides what to
     do with them. Also, it is changed from a constant to a field of the
     server config, max_incoming.
  2. Per-incoming buffered data is limited to a new limit stored in the
     server config, incoming_buffer_size, beyond which subsequent
     packets are discarded if received in these conditions.
  3. The sum total of all incoming buffered data is limited to a new
     limit stored in the server config, incoming_buffer_size_total,
     beyond which subsequent packets are discarded if received in these
     conditions.
@djc djc merged commit 30e1d6f into quinn-rs:main May 1, 2024
8 checks passed
@djc
Copy link
Member

djc commented May 1, 2024

Thanks for all the effort here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0-RTT packets can be lost due to race condition introduced along with Incoming
3 participants