-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc/netstack: initial proposal #98
Conversation
I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets. |
doc/netstack.mkdn
Outdated
Each client can configure the precise notification bits they want to associate | ||
with each event. | ||
|
||
This is essentially a `select`/`epoll`-first architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you though at all about what a "io_uring-first architecture" may look like, in contrast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io_uring
assumes the existence of a "more powerful" party hosting the I/O system (the kernel) and a "less powerful" party doing I/O (a userland process). The more powerful side has access to the less powerful side's address space, and can mess with it directly. Because our network stack is not going to be in the kernel, this would require sharing the uring area between two tasks. We only share address space sections during a blocking call (with an explicit lease), and blocking calls are hairy in this context for the reasons I described elsewhere in the doc.
In particular, sharing memory during a blocking call doesn't permit the normal uring use case of "check the descriptor ring for new events that happened while I was doing other work."
Also, io_uring
appears to let you issue operations in addition to detecting completion, which I'm not proposing here -- we'll use standard message sends to trigger events in the network stack.
It's worth noting that the asynchronous send mechanism, which was outlined in the original proposal and mirrors that in MINIX 3, and which I am currently going out of my way to avoid implementing, is quite similar to io_uring
if io_uring
let you queue up arbitrary calls instead of just I/O.
We could also implement an actual shared-memory I/O mechanism by explicitly sharing a region of memory, permanently, between tasks. It's an option if we hit a performance bottleneck here, but I have a mild allergy to shared memory -- and so does Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going to rewrite this sentence to reference kqueue instead, which is a closer analog for what I'm imagining. I'm concerned that the original sentence may be taken to imply the warts of select/epoll, which was not what I meant.
|
||
### Buffers | ||
|
||
So, this means the network stack is managing a pool of buffers. We have several |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... coming back to my earlier comment, I think maybe this ends up being a bit more io_uring like than I initially appreciated.
I'm leaving the door open to such possibilities because we've had some disagreements here in the past, but I'm not personally going to try doing anything L3. My preference would be for UDP over IP if not something higher level than that, unless someone makes a really good argument why a custom L3 is better. |
A custom L3 is not necessarily better, but if communication between SPs and the control plane is more or less RPC over a point to point link (where the requests may be routed to different applications running on the SP) then building a capable IP stack would not be required. That would however, require us to limit ourselves to live within those constraints. It would alleviate the need for and additional layer of addresses, since we'd only use MAC addresses to switch packets over the wire, and once received the frame contents would be routed to tasks based on some kind of task identifier (could be the equivalent of a port number, but could be something different too, potentially as wild as a URL or whatever). Ethernet is just the vehicle to get buffers of serialized requests/responses between endpoints. If however we want generic networking, then yes.. let's not reinvent IP/UDP and instead just implement that. In this case I'll be leaning hard on us implementing IPv6, since auto configuration, address resolution and neighbor discovery are much better defined than in IPv4. In addition that would make the SP network route-able/compatible with the host CPU underlay network and we can lean on switch features provided by the Monorail (management network) switch ASIC for filtering etc. Do we want to implement generic networking or can we get away with a more RPC style thing between control plane and SP? |
Hi guys! I mentioned the possibility of IP or Ethernet in an attempt to make this document independent from that choice. I appear to have failed; but regardless, this doc is not proposing an L3 protocol, or a lack of an L3 protocol, and is really intended to be about task interaction and memory management in the network stack independent of protocols used. |
Heh.. my bad, that was not clear. This bull only saw the bright red protocol bit. But yes, managing that state somewhere makes sense. Looking forward seeing what this ends up looking like. |
Obviated by later netstack work. |
No description provided.