Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting the number of poll loops to prevent DoS events #954

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

ryan-summers
Copy link
Contributor

@ryan-summers ryan-summers commented Jul 15, 2024

This PR fixes #848 by limiting the number of ingress/egress tokens processed by observing the characteristics of the sockets contained in the polled socket set.


Edit: Further information on hardware testing can be found in quartiq/booster#423

@jordens
Copy link

jordens commented Jul 15, 2024

Why not get rid of the loop and let the application decide on the strategy? Or, asking differently, why is 1 not the appropriate iteration count here? Or why is it not something like the number of rx/tx buffer slots of the device (i.e. either derived from some qty known by smiltcp or to be chosen by the application)?

@ryan-summers
Copy link
Contributor Author

ryan-summers commented Jul 15, 2024

I like the idea of basing the poll iteration count based on buffer sizing and data available to smoltcp, so I've updated it to calculate a maximum iteration count based on the sockets (where this makes sense). For protocols that do not buffer data in a packetized format (i.e. TCP, DHCP, DNS), I've simply set the maximum iteration count to 1, since these protocols could theoretically have an infinite sequence of packet exchange and the user application should likely figure out how to handle polling.

Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.

Project coverage is 79.18%. Comparing base (99ddd22) to head (6a3f926).
Report is 7 commits behind head on main.

Files Patch % Lines
src/iface/interface/mod.rs 0.00% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
- Coverage   79.21%   79.18%   -0.03%     
==========================================
  Files          81       81              
  Lines       26820    26885      +65     
==========================================
+ Hits        21245    21289      +44     
- Misses       5575     5596      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -396,6 +396,9 @@ impl Interface {
/// This function returns a boolean value indicating whether any packets were
/// processed or emitted, and thus, whether the readiness of any socket might
/// have changed.
///
/// # Note
/// If this function returns true, there may still be packets to be transmitted or received.
Copy link
Contributor

@whitequark whitequark Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note should explain why, because as it is the obvious response to reading this note is to do it in a loop, but this just reintroduces the original DoS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded the why in 6492c52, let me know if you have other verbiage you'd prefer and I'm happy to incorporate it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@whitequark
Copy link
Contributor

whitequark commented Jul 17, 2024

@jordens While I agree in principle with the idea of basing the amount of polling on some function of the total buffer size I don't feel that smoltcp can reasonably calculate this on its own (the strategy in this PR doesn't seem reasonable to me), and this should probably be left to the application, since the final application itself is the only entity which is aware of the performance requirements and the characteristics of the platform. If we do go with smoltcp calculating the iteration count on the buffer size this should be the PHY buffer size rather than the stack buffer size.

An alternative approach, more principled but also requiring considerably more work, is to make an opaque polling iterator for the socket set (i.e. use external iteration instead of the internal iteration implicitly done by poll currently) which can be saved between polling attempts, so that the application can know it has done a full cycle of polling, ignoring the pauses caused by TX buffer becoming full. I feel like we may run into issues with lifetimes for this approach though.

@jordens
Copy link

jordens commented Jul 17, 2024

The iteration approach sounds neat!
But if that turns out to be difficult or only long-term, let's go for a single iteration now. Then the application can decide the strategy.
Always assuming that a low iteration count has no negative impact: I see there is some the fragmentation stuff and some ipv4/sixlowpan egress outside the loop.

@whitequark
Copy link
Contributor

whitequark commented Jul 17, 2024

But if that turns out to be difficult or only long-term, let's go for a single iteration now. Then the application can decide the strategy.

Imagine that you have 1000 sockets and each of them has a single packet to send (and the TX buffer never becomes full, e.g. you're on a PC). You'll then have to iterate through the socket set 499500000 times to send all of the packets, am I wrong?

@jordens
Copy link

jordens commented Jul 17, 2024

Maybe. But I may be misundertanding you. I tought we are talking about removing the loop {} around this block here:

loop {
let mut did_something = false;
did_something |= self.socket_ingress(device, sockets);
did_something |= self.socket_egress(device, sockets);
#[cfg(feature = "proto-igmp")]
{
did_something |= self.igmp_egress(device);
}
if did_something {
readiness_may_have_changed = true;
} else {
break;
}
}

Ignoring the fragmentation and timestamp stuff before the loop, I can't see how allowing the application to do that loop over poll() (thus "deciding the strategy" as we both proposed) instead of having poll() loop over ingress+egress (as it is now) changes anything performance-wise.

@whitequark
Copy link
Contributor

@jordens Oh, you are right. I was relying too much on my memory and wasn't paying enough attention. I agree that only running a single iteration of that loop would be fine.

@ryan-summers
Copy link
Contributor Author

Maybe. But I may be misundertanding you. I tought we are talking about removing the loop {} around this block here:

loop {
let mut did_something = false;
did_something |= self.socket_ingress(device, sockets);
did_something |= self.socket_egress(device, sockets);
#[cfg(feature = "proto-igmp")]
{
did_something |= self.igmp_egress(device);
}
if did_something {
readiness_may_have_changed = true;
} else {
break;
}
}

There's actually another loop in socket_ingress() that repeatedly polls the PHY for RX tokens that has to be broken, otherwise removal of the outermost loop doesn't solve the problem.

@whitequark
Copy link
Contributor

whitequark commented Jul 17, 2024

So if we change both loops to a single iteration then the problem I described does arise.

But if you change the loop in socket_ingress() to poll up to the number of PHY RX buffers, it should be fine. I think we have the number of PHY TX buffers available in PHY properties; do we have it for RX buffers? If not we can add it.

(Also, if we do limit the inner loop, is the outer loop still an issue?)

@ryan-summers
Copy link
Contributor Author

So if we change both loops to a single iteration then the problem I described does arise.

I suspect so, although I didn't completely follow your math :)

But if you change the loop in socket_ingress() to poll up to the number of PHY RX buffers, it should be fine. I think we have the number of PHY TX buffers available in PHY properties; do we have it for RX buffers? If not we can add it.

I didn't know if this property was available on the PHY trait, but this was what I was thinking in the back of my mind when drafting up the changes. I'll take a look at this shortly, thanks for the feedback!

(Also, if we do limit the inner loop, is the outer loop still an issue?)

I believe so, yes. Since essentially the problem we observe is that the PHY is essentially generating RX tokens faster than they can be processed. Any kind of loop {} over RX token processing will always block infinitely.

@whitequark
Copy link
Contributor

@ryan-summers I'm happy to proceed changing both loops to a single iteration; like Robert says this should not have detrimental side effects, as far as I can tell.

@ryan-summers
Copy link
Contributor Author

Yup sorry, meant to get to this on Thursday, but am now traveling until Monday. I'm planning on going with the single loop approach, thanks for clarifying :)

@ryan-summers
Copy link
Contributor Author

@whitequark Let me know if these updated changes look appropriate. I'll run this PR on hardware in my office in the next day or two to verify it as well.

/// Users should should perform other tasks before calling this function again. If executed in
/// a loop, its possible that packets can arrive faster than they can be processed, which could
/// cause an infinite loop.
/// This function performs a bounded amount of work per call to avoid exhausting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, typo:

Suggested change
/// This function performs a bounded amount of work per call to avoid exhausting
/// This function performs a bounded amount of work per call to avoid

(I guess this also makes formatting uglier... sigh)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, I saw this already and removed it, and indeed the formatting is ugly

@ryan-summers
Copy link
Contributor Author

Alright, I have now tested this on hardware and confirmed the problem is still fixed with these changes.

Copy link
Contributor

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please squash and I'll merge.

Updating process loop to be defined by socket buffers when possible

Further changes after hardware testing

Fixing test

Adding CHANGELOG entry

Expanding comment after review

Update src/iface/interface/mod.rs

Co-authored-by: Catherine <whitequark@whitequark.org>

Removing redundant phrasing in comment

Updating verbiage to remove references to unsafe and safe
@ryan-summers ryan-summers force-pushed the issue/848/infinite-poll branch from 2bcae4a to 8c2cef1 Compare July 23, 2024 10:28
@ryan-summers
Copy link
Contributor Author

Looks good, please squash and I'll merge.

I've done the squash manually. However, I thought Github added the ability for contributors / admins to squash before merge? If it's not showing up on the merge options list, you can probably enable it in the Github repository settings as well :)

@whitequark
Copy link
Contributor

whitequark commented Jul 23, 2024

However, I thought Github added the ability for contributors / admins to squash before merge? If it's not showing up on the merge options list, you can probably enable it in the Github repository settings as well :)

Nope, this is not available when merge queue is used. Which is annoying but unfortunately the merge queue is a lot more important.

@whitequark whitequark added this pull request to the merge queue Jul 23, 2024
Merged via the queue into smoltcp-rs:main with commit 53caf70 Jul 23, 2024
9 checks passed
@ryan-summers ryan-summers deleted the issue/848/infinite-poll branch July 23, 2024 14:16
Dirbaio added a commit that referenced this pull request Sep 16, 2024
This makes `.poll()` behave the same as before #954. Users affected by DoS
concerns can use the finer-grained egress-only and single-packet-ingress-only fns.
Dirbaio added a commit that referenced this pull request Sep 16, 2024
This makes `.poll()` behave the same as before #954. Users affected by DoS
concerns can use the finer-grained egress-only and single-packet-ingress-only fns.
Dirbaio added a commit that referenced this pull request Sep 16, 2024
This makes `.poll()` behave the same as before #954. Users affected by DoS
concerns can use the finer-grained egress-only and single-packet-ingress-only fns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Packet flooding can cause interface.poll() to require excessive processing time
4 participants