Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/netstack: initial proposal #98

Closed
wants to merge 2 commits into from
Closed

doc/netstack: initial proposal #98

wants to merge 2 commits into from

Conversation

cbiffle
Copy link
Collaborator

@cbiffle cbiffle commented Nov 19, 2020

No description provided.

@Nieuwejaar
Copy link

I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets.

Each client can configure the precise notification bits they want to associate
with each event.

This is essentially a `select`/`epoll`-first architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you though at all about what a "io_uring-first architecture" may look like, in contrast?

Copy link
Collaborator Author

@cbiffle cbiffle Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io_uring assumes the existence of a "more powerful" party hosting the I/O system (the kernel) and a "less powerful" party doing I/O (a userland process). The more powerful side has access to the less powerful side's address space, and can mess with it directly. Because our network stack is not going to be in the kernel, this would require sharing the uring area between two tasks. We only share address space sections during a blocking call (with an explicit lease), and blocking calls are hairy in this context for the reasons I described elsewhere in the doc.

In particular, sharing memory during a blocking call doesn't permit the normal uring use case of "check the descriptor ring for new events that happened while I was doing other work."

Also, io_uring appears to let you issue operations in addition to detecting completion, which I'm not proposing here -- we'll use standard message sends to trigger events in the network stack.

It's worth noting that the asynchronous send mechanism, which was outlined in the original proposal and mirrors that in MINIX 3, and which I am currently going out of my way to avoid implementing, is quite similar to io_uring if io_uring let you queue up arbitrary calls instead of just I/O.

We could also implement an actual shared-memory I/O mechanism by explicitly sharing a region of memory, permanently, between tasks. It's an option if we hit a performance bottleneck here, but I have a mild allergy to shared memory -- and so does Rust.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to rewrite this sentence to reference kqueue instead, which is a closer analog for what I'm imagining. I'm concerned that the original sentence may be taken to imply the warts of select/epoll, which was not what I meant.


### Buffers

So, this means the network stack is managing a pool of buffers. We have several
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... coming back to my earlier comment, I think maybe this ends up being a bit more io_uring like than I initially appreciated.

@cbiffle
Copy link
Collaborator Author

cbiffle commented Nov 20, 2020

I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets.

I'm leaving the door open to such possibilities because we've had some disagreements here in the past, but I'm not personally going to try doing anything L3. My preference would be for UDP over IP if not something higher level than that, unless someone makes a really good argument why a custom L3 is better.

@arjenroodselaar
Copy link
Contributor

I was making a first pass through the doc. You mention that some tasks might use raw IP or ethernet rather than UDP. If you're planning to roll your own layer 3 protocol and have it traverse the switch, we'll need to coordinate. My guess is that we're going to want to be pretty aggressive about dropping any non-standard packets.

I'm leaving the door open to such possibilities because we've had some disagreements here in the past, but I'm not personally going to try doing anything L3. My preference would be for UDP over IP if not something higher level than that, unless someone makes a really good argument why a custom L3 is better.

A custom L3 is not necessarily better, but if communication between SPs and the control plane is more or less RPC over a point to point link (where the requests may be routed to different applications running on the SP) then building a capable IP stack would not be required. That would however, require us to limit ourselves to live within those constraints. It would alleviate the need for and additional layer of addresses, since we'd only use MAC addresses to switch packets over the wire, and once received the frame contents would be routed to tasks based on some kind of task identifier (could be the equivalent of a port number, but could be something different too, potentially as wild as a URL or whatever). Ethernet is just the vehicle to get buffers of serialized requests/responses between endpoints.

If however we want generic networking, then yes.. let's not reinvent IP/UDP and instead just implement that. In this case I'll be leaning hard on us implementing IPv6, since auto configuration, address resolution and neighbor discovery are much better defined than in IPv4. In addition that would make the SP network route-able/compatible with the host CPU underlay network and we can lean on switch features provided by the Monorail (management network) switch ASIC for filtering etc.

Do we want to implement generic networking or can we get away with a more RPC style thing between control plane and SP?

@cbiffle
Copy link
Collaborator Author

cbiffle commented Nov 21, 2020

A custom L3 is not necessarily better

Hi guys!

I mentioned the possibility of IP or Ethernet in an attempt to make this document independent from that choice. I appear to have failed; but regardless, this doc is not proposing an L3 protocol, or a lack of an L3 protocol, and is really intended to be about task interaction and memory management in the network stack independent of protocols used.

@arjenroodselaar
Copy link
Contributor

A custom L3 is not necessarily better

Hi guys!

I mentioned the possibility of IP or Ethernet in an attempt to make this document independent from that choice. I appear to have failed; but regardless, this doc is not proposing an L3 protocol, or a lack of an L3 protocol, and is really intended to be about task interaction and memory management in the network stack independent of protocols used.

Heh.. my bad, that was not clear. This bull only saw the bright red protocol bit. But yes, managing that state somewhere makes sense. Looking forward seeing what this ends up looking like.

@cbiffle
Copy link
Collaborator Author

cbiffle commented Jan 21, 2022

Obviated by later netstack work.

@cbiffle cbiffle closed this Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants