doc/netstack: initial proposal #98

cbiffle · 2020-11-19T22:58:34Z

No description provided.

Nieuwejaar · 2020-11-20T12:50:06Z

I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets.

steveklabnik · 2020-11-20T18:21:26Z

doc/netstack.mkdn

+Each client can configure the precise notification bits they want to associate
+with each event.
+
+This is essentially a `select`/`epoll`-first architecture.


Have you though at all about what a "io_uring-first architecture" may look like, in contrast?

io_uring assumes the existence of a "more powerful" party hosting the I/O system (the kernel) and a "less powerful" party doing I/O (a userland process). The more powerful side has access to the less powerful side's address space, and can mess with it directly. Because our network stack is not going to be in the kernel, this would require sharing the uring area between two tasks. We only share address space sections during a blocking call (with an explicit lease), and blocking calls are hairy in this context for the reasons I described elsewhere in the doc.

In particular, sharing memory during a blocking call doesn't permit the normal uring use case of "check the descriptor ring for new events that happened while I was doing other work."

Also, io_uring appears to let you issue operations in addition to detecting completion, which I'm not proposing here -- we'll use standard message sends to trigger events in the network stack.

It's worth noting that the asynchronous send mechanism, which was outlined in the original proposal and mirrors that in MINIX 3, and which I am currently going out of my way to avoid implementing, is quite similar to io_uring if io_uring let you queue up arbitrary calls instead of just I/O.

We could also implement an actual shared-memory I/O mechanism by explicitly sharing a region of memory, permanently, between tasks. It's an option if we hit a performance bottleneck here, but I have a mild allergy to shared memory -- and so does Rust.

I think I'm going to rewrite this sentence to reference kqueue instead, which is a closer analog for what I'm imagining. I'm concerned that the original sentence may be taken to imply the warts of select/epoll, which was not what I meant.

steveklabnik · 2020-11-20T18:25:14Z

doc/netstack.mkdn

+
+### Buffers
+
+So, this means the network stack is managing a pool of buffers. We have several


... coming back to my earlier comment, I think maybe this ends up being a bit more io_uring like than I initially appreciated.

cbiffle · 2020-11-20T21:20:27Z

I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets.

I'm leaving the door open to such possibilities because we've had some disagreements here in the past, but I'm not personally going to try doing anything L3. My preference would be for UDP over IP if not something higher level than that, unless someone makes a really good argument why a custom L3 is better.

arjenroodselaar · 2020-11-20T23:33:55Z

I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets.

I'm leaving the door open to such possibilities because we've had some disagreements here in the past, but I'm not personally going to try doing anything L3. My preference would be for UDP over IP if not something higher level than that, unless someone makes a really good argument why a custom L3 is better.

A custom L3 is not necessarily better, but if communication between SPs and the control plane is more or less RPC over a point to point link (where the requests may be routed to different applications running on the SP) then building a capable IP stack would not be required. That would however, require us to limit ourselves to live within those constraints. It would alleviate the need for and additional layer of addresses, since we'd only use MAC addresses to switch packets over the wire, and once received the frame contents would be routed to tasks based on some kind of task identifier (could be the equivalent of a port number, but could be something different too, potentially as wild as a URL or whatever). Ethernet is just the vehicle to get buffers of serialized requests/responses between endpoints.

If however we want generic networking, then yes.. let's not reinvent IP/UDP and instead just implement that. In this case I'll be leaning hard on us implementing IPv6, since auto configuration, address resolution and neighbor discovery are much better defined than in IPv4. In addition that would make the SP network route-able/compatible with the host CPU underlay network and we can lean on switch features provided by the Monorail (management network) switch ASIC for filtering etc.

Do we want to implement generic networking or can we get away with a more RPC style thing between control plane and SP?

cbiffle · 2020-11-21T00:31:47Z

A custom L3 is not necessarily better

Hi guys!

I mentioned the possibility of IP or Ethernet in an attempt to make this document independent from that choice. I appear to have failed; but regardless, this doc is not proposing an L3 protocol, or a lack of an L3 protocol, and is really intended to be about task interaction and memory management in the network stack independent of protocols used.

arjenroodselaar · 2020-11-23T22:08:37Z

A custom L3 is not necessarily better

Hi guys!

I mentioned the possibility of IP or Ethernet in an attempt to make this document independent from that choice. I appear to have failed; but regardless, this doc is not proposing an L3 protocol, or a lack of an L3 protocol, and is really intended to be about task interaction and memory management in the network stack independent of protocols used.

Heh.. my bad, that was not clear. This bull only saw the bright red protocol bit. But yes, managing that state somewhere makes sense. Looking forward seeing what this ends up looking like.

cbiffle · 2022-01-21T21:48:44Z

Obviated by later netstack work.

doc/netstack: initial proposal

6864118

steveklabnik reviewed Nov 20, 2020

View reviewed changes

doc/netstack: rework analogy, add example

9ec2eb9

cbiffle closed this Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc/netstack: initial proposal #98

doc/netstack: initial proposal #98

cbiffle commented Nov 19, 2020

Nieuwejaar commented Nov 20, 2020

steveklabnik Nov 20, 2020

cbiffle Nov 20, 2020 •

edited

Loading

cbiffle Nov 20, 2020

steveklabnik Nov 20, 2020

cbiffle commented Nov 20, 2020

arjenroodselaar commented Nov 20, 2020

cbiffle commented Nov 21, 2020

arjenroodselaar commented Nov 23, 2020

cbiffle commented Jan 21, 2022


		### Buffers

		So, this means the network stack is managing a pool of buffers. We have several

doc/netstack: initial proposal #98

doc/netstack: initial proposal #98

Conversation

cbiffle commented Nov 19, 2020

Nieuwejaar commented Nov 20, 2020

steveklabnik Nov 20, 2020

Choose a reason for hiding this comment

cbiffle Nov 20, 2020 • edited Loading

Choose a reason for hiding this comment

cbiffle Nov 20, 2020

Choose a reason for hiding this comment

steveklabnik Nov 20, 2020

Choose a reason for hiding this comment

cbiffle commented Nov 20, 2020

arjenroodselaar commented Nov 20, 2020

cbiffle commented Nov 21, 2020

arjenroodselaar commented Nov 23, 2020

cbiffle commented Jan 21, 2022

cbiffle Nov 20, 2020 •

edited

Loading