Limiting the number of poll loops to prevent DoS events #954

ryan-summers · 2024-07-15T09:29:56Z

This PR fixes #848 by limiting the number of ingress/egress tokens processed by observing the characteristics of the sockets contained in the polled socket set.

Edit: Further information on hardware testing can be found in quartiq/booster#423

jordens · 2024-07-15T10:14:20Z

Why not get rid of the loop and let the application decide on the strategy? Or, asking differently, why is 1 not the appropriate iteration count here? Or why is it not something like the number of rx/tx buffer slots of the device (i.e. either derived from some qty known by smiltcp or to be chosen by the application)?

ryan-summers · 2024-07-15T10:37:04Z

I like the idea of basing the poll iteration count based on buffer sizing and data available to smoltcp, so I've updated it to calculate a maximum iteration count based on the sockets (where this makes sense). For protocols that do not buffer data in a packetized format (i.e. TCP, DHCP, DNS), I've simply set the maximum iteration count to 1, since these protocols could theoretically have an infinite sequence of packet exchange and the user application should likely figure out how to handle polling.

codecov · 2024-07-15T10:38:39Z

Codecov Report

Attention: Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.

Project coverage is 79.18%. Comparing base (99ddd22) to head (6a3f926).
Report is 7 commits behind head on main.

Files	Patch %	Lines
src/iface/interface/mod.rs	0.00%	23 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
- Coverage   79.21%   79.18%   -0.03%     
==========================================
  Files          81       81              
  Lines       26820    26885      +65     
==========================================
+ Hits        21245    21289      +44     
- Misses       5575     5596      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

whitequark · 2024-07-17T08:25:42Z

src/iface/interface/mod.rs

@@ -396,6 +396,9 @@ impl Interface {
    /// This function returns a boolean value indicating whether any packets were
    /// processed or emitted, and thus, whether the readiness of any socket might
    /// have changed.
+    ///
+    /// # Note
+    /// If this function returns true, there may still be packets to be transmitted or received.


This note should explain why, because as it is the obvious response to reading this note is to do it in a loop, but this just reintroduces the original DoS.

I expanded the why in 6492c52, let me know if you have other verbiage you'd prefer and I'm happy to incorporate it :)

src/iface/interface/mod.rs

whitequark · 2024-07-17T08:33:32Z

@jordens While I agree in principle with the idea of basing the amount of polling on some function of the total buffer size I don't feel that smoltcp can reasonably calculate this on its own (the strategy in this PR doesn't seem reasonable to me), and this should probably be left to the application, since the final application itself is the only entity which is aware of the performance requirements and the characteristics of the platform. If we do go with smoltcp calculating the iteration count on the buffer size this should be the PHY buffer size rather than the stack buffer size.

An alternative approach, more principled but also requiring considerably more work, is to make an opaque polling iterator for the socket set (i.e. use external iteration instead of the internal iteration implicitly done by poll currently) which can be saved between polling attempts, so that the application can know it has done a full cycle of polling, ignoring the pauses caused by TX buffer becoming full. I feel like we may run into issues with lifetimes for this approach though.

jordens · 2024-07-17T08:50:08Z

The iteration approach sounds neat!
But if that turns out to be difficult or only long-term, let's go for a single iteration now. Then the application can decide the strategy.
Always assuming that a low iteration count has no negative impact: I see there is some the fragmentation stuff and some ipv4/sixlowpan egress outside the loop.

whitequark · 2024-07-17T08:51:59Z

But if that turns out to be difficult or only long-term, let's go for a single iteration now. Then the application can decide the strategy.

Imagine that you have 1000 sockets and each of them has a single packet to send (and the TX buffer never becomes full, e.g. you're on a PC). You'll then have to iterate through the socket set 499500000 times to send all of the packets, am I wrong?

jordens · 2024-07-17T09:07:34Z

Maybe. But I may be misundertanding you. I tought we are talking about removing the loop {} around this block here:

smoltcp/src/iface/interface/mod.rs

Lines 434 to 449 in 7b125ef

    
           loop { 
        
               let mut did_something = false; 
        
               did_something |= self.socket_ingress(device, sockets); 
        
               did_something |= self.socket_egress(device, sockets); 
        
               #[cfg(feature = "proto-igmp")] 
        
               { 
        
                   did_something |= self.igmp_egress(device); 
        
               } 
        
               if did_something { 
        
                   readiness_may_have_changed = true; 
        
               } else { 
        
                   break; 
        
               } 
        
           }

Ignoring the fragmentation and timestamp stuff before the loop, I can't see how allowing the application to do that loop over poll() (thus "deciding the strategy" as we both proposed) instead of having poll() loop over ingress+egress (as it is now) changes anything performance-wise.

whitequark · 2024-07-17T09:34:45Z

@jordens Oh, you are right. I was relying too much on my memory and wasn't paying enough attention. I agree that only running a single iteration of that loop would be fine.

ryan-summers · 2024-07-17T09:57:40Z

Maybe. But I may be misundertanding you. I tought we are talking about removing the loop {} around this block here:

smoltcp/src/iface/interface/mod.rs

Lines 434 to 449 in 7b125ef

loop {

let mut did_something = false;

did_something |= self.socket_ingress(device, sockets);

did_something |= self.socket_egress(device, sockets);

#[cfg(feature = "proto-igmp")]

{

did_something |= self.igmp_egress(device);

}

if did_something {

readiness_may_have_changed = true;

} else {

break;

}

}

There's actually another loop in socket_ingress() that repeatedly polls the PHY for RX tokens that has to be broken, otherwise removal of the outermost loop doesn't solve the problem.

whitequark · 2024-07-17T10:04:13Z

So if we change both loops to a single iteration then the problem I described does arise.

But if you change the loop in socket_ingress() to poll up to the number of PHY RX buffers, it should be fine. I think we have the number of PHY TX buffers available in PHY properties; do we have it for RX buffers? If not we can add it.

(Also, if we do limit the inner loop, is the outer loop still an issue?)

ryan-summers · 2024-07-17T10:20:55Z

So if we change both loops to a single iteration then the problem I described does arise.

I suspect so, although I didn't completely follow your math :)

But if you change the loop in socket_ingress() to poll up to the number of PHY RX buffers, it should be fine. I think we have the number of PHY TX buffers available in PHY properties; do we have it for RX buffers? If not we can add it.

I didn't know if this property was available on the PHY trait, but this was what I was thinking in the back of my mind when drafting up the changes. I'll take a look at this shortly, thanks for the feedback!

(Also, if we do limit the inner loop, is the outer loop still an issue?)

I believe so, yes. Since essentially the problem we observe is that the PHY is essentially generating RX tokens faster than they can be processed. Any kind of loop {} over RX token processing will always block infinitely.

whitequark · 2024-07-18T20:11:33Z

@ryan-summers I'm happy to proceed changing both loops to a single iteration; like Robert says this should not have detrimental side effects, as far as I can tell.

ryan-summers · 2024-07-18T23:31:07Z

Yup sorry, meant to get to this on Thursday, but am now traveling until Monday. I'm planning on going with the single loop approach, thanks for clarifying :)

ryan-summers · 2024-07-21T18:31:25Z

@whitequark Let me know if these updated changes look appropriate. I'll run this PR on hardware in my office in the next day or two to verify it as well.

src/iface/interface/mod.rs

whitequark · 2024-07-21T18:44:57Z

src/iface/interface/mod.rs

-    /// Users should should perform other tasks before calling this function again. If executed in
-    /// a loop, its possible that packets can arrive faster than they can be processed, which could
-    /// cause an infinite loop.
+    /// This function performs a bounded amount of work per call to avoid exhausting


Oh, typo:

Suggested change

/// This function performs a bounded amount of work per call to avoid exhausting

/// This function performs a bounded amount of work per call to avoid

(I guess this also makes formatting uglier... sigh)

Yeah sorry, I saw this already and removed it, and indeed the formatting is ugly

ryan-summers · 2024-07-22T09:13:54Z

Alright, I have now tested this on hardware and confirmed the problem is still fixed with these changes.

src/iface/interface/mod.rs

whitequark

Looks good, please squash and I'll merge.

Updating process loop to be defined by socket buffers when possible Further changes after hardware testing Fixing test Adding CHANGELOG entry Expanding comment after review Update src/iface/interface/mod.rs Co-authored-by: Catherine <whitequark@whitequark.org> Removing redundant phrasing in comment Updating verbiage to remove references to unsafe and safe

ryan-summers · 2024-07-23T10:29:42Z

Looks good, please squash and I'll merge.

I've done the squash manually. However, I thought Github added the ability for contributors / admins to squash before merge? If it's not showing up on the merge options list, you can probably enable it in the Github repository settings as well :)

whitequark · 2024-07-23T14:09:36Z

However, I thought Github added the ability for contributors / admins to squash before merge? If it's not showing up on the merge options list, you can probably enable it in the Github repository settings as well :)

Nope, this is not available when merge queue is used. Which is annoying but unfortunately the merge queue is a lot more important.

This makes `.poll()` behave the same as before #954. Users affected by DoS concerns can use the finer-grained egress-only and single-packet-ingress-only fns.

whitequark reviewed Jul 17, 2024

View reviewed changes

src/iface/interface/mod.rs Outdated Show resolved Hide resolved

whitequark reviewed Jul 21, 2024

View reviewed changes

src/iface/interface/mod.rs Outdated Show resolved Hide resolved

whitequark reviewed Jul 21, 2024

View reviewed changes

dnadlinger reviewed Jul 22, 2024

View reviewed changes

src/iface/interface/mod.rs Outdated Show resolved Hide resolved

whitequark reviewed Jul 23, 2024

View reviewed changes

ryan-summers force-pushed the issue/848/infinite-poll branch from 2bcae4a to 8c2cef1 Compare July 23, 2024 10:28

whitequark added this pull request to the merge queue Jul 23, 2024

Merged via the queue into smoltcp-rs:main with commit 53caf70 Jul 23, 2024
9 checks passed

ryan-summers deleted the issue/848/infinite-poll branch July 23, 2024 14:16

ryan-summers mentioned this pull request Sep 12, 2024

Throughput has decreased drastically after polling updates #990

Closed

Dirbaio mentioned this pull request Sep 16, 2024

iface: make poll() process all packets, add fine-grained poll functions. #991

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limiting the number of poll loops to prevent DoS events #954

Limiting the number of poll loops to prevent DoS events #954

ryan-summers commented Jul 15, 2024 •

edited

Loading

jordens commented Jul 15, 2024

ryan-summers commented Jul 15, 2024 •

edited

Loading

codecov bot commented Jul 15, 2024

whitequark Jul 17, 2024 •

edited

Loading

ryan-summers Jul 21, 2024

whitequark Jul 21, 2024

whitequark commented Jul 17, 2024 •

edited

Loading

jordens commented Jul 17, 2024

whitequark commented Jul 17, 2024 •

edited

Loading

jordens commented Jul 17, 2024

whitequark commented Jul 17, 2024

ryan-summers commented Jul 17, 2024

whitequark commented Jul 17, 2024 •

edited

Loading

ryan-summers commented Jul 17, 2024

whitequark commented Jul 18, 2024

ryan-summers commented Jul 18, 2024

ryan-summers commented Jul 21, 2024

whitequark Jul 21, 2024

ryan-summers Jul 21, 2024

ryan-summers commented Jul 22, 2024

whitequark left a comment

ryan-summers commented Jul 23, 2024

whitequark commented Jul 23, 2024 •

edited

Loading

	/// This function performs a bounded amount of work per call to avoid exhausting
	/// This function performs a bounded amount of work per call to avoid

Limiting the number of poll loops to prevent DoS events #954

Limiting the number of poll loops to prevent DoS events #954

Conversation

ryan-summers commented Jul 15, 2024 • edited Loading

jordens commented Jul 15, 2024

ryan-summers commented Jul 15, 2024 • edited Loading

codecov bot commented Jul 15, 2024

Codecov Report

whitequark Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

ryan-summers Jul 21, 2024

Choose a reason for hiding this comment

whitequark Jul 21, 2024

Choose a reason for hiding this comment

whitequark commented Jul 17, 2024 • edited Loading

jordens commented Jul 17, 2024

whitequark commented Jul 17, 2024 • edited Loading

jordens commented Jul 17, 2024

whitequark commented Jul 17, 2024

ryan-summers commented Jul 17, 2024

whitequark commented Jul 17, 2024 • edited Loading

ryan-summers commented Jul 17, 2024

whitequark commented Jul 18, 2024

ryan-summers commented Jul 18, 2024

ryan-summers commented Jul 21, 2024

whitequark Jul 21, 2024

Choose a reason for hiding this comment

ryan-summers Jul 21, 2024

Choose a reason for hiding this comment

ryan-summers commented Jul 22, 2024

whitequark left a comment

Choose a reason for hiding this comment

ryan-summers commented Jul 23, 2024

whitequark commented Jul 23, 2024 • edited Loading

ryan-summers commented Jul 15, 2024 •

edited

Loading

ryan-summers commented Jul 15, 2024 •

edited

Loading

whitequark Jul 17, 2024 •

edited

Loading

whitequark commented Jul 17, 2024 •

edited

Loading

whitequark commented Jul 17, 2024 •

edited

Loading

whitequark commented Jul 17, 2024 •

edited

Loading

whitequark commented Jul 23, 2024 •

edited

Loading