Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of UDPMux: Add BatchIO and multiple ports options #608

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

cnderrauber
Copy link
Member

Performance improvements:
Add BatchIO option to NewMultiUDPMuxPort(s), so the UDPMux can use batch write to improve write throughput.
Add multiple ports support to MultiUDPMuxDefault, this option will listen to multiple ports on a single interface for traffic load balance.

go.mod Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage is 89.00% of modified lines.

Files Changed Coverage
udp_muxed_conn.go 70.00%
udp_mux_multi.go 90.69%
udp_mux.go 100.00%

📢 Thoughts on this report? Let us know!.

@cnderrauber cnderrauber force-pushed the batch_ports branch 4 times, most recently from 3338e2a to 9330afe Compare August 29, 2023 10:19
Copy link
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work @cnderrauber !!!

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool to see this all come together. Nicely done!

just a few suggestions on comments

udp_mux_multi.go Outdated Show resolved Hide resolved
udp_mux_multi.go Outdated Show resolved Hide resolved
udp_mux_multi.go Outdated Show resolved Hide resolved
udp_mux_multi.go Outdated Show resolved Hide resolved
Copy link
Member

@stv0g stv0g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cnderrauber,

great to see progress in this direction :)

A few points:

  1. This PR breaks the API of UDPMux. So we need to bump the major version of the module. Thats something I am open to, but I think we should batch this with other API-breaking changes (like Fix type of CandidatePairState enum #571).
  2. Cant you simply use AllConnsGetter to get the count of connections?
  3. I do not the the batched connection interface being used anywhere yet. Do you have or are planning to open a dedicated PR for this?
  4. It would be good if we can also add those new features to the standard UDPMux. Or is there a reasoning behind adding this only to the MultiUDPMux?

Best regards,
Steffen

udp_mux.go Show resolved Hide resolved
@cnderrauber
Copy link
Member Author

  1. Cant you simply use AllConnsGetter to get the count of connections?

I think it can't be used. The AllConnsGetter is implented by tcpmux and it also doesn't have a method to get the count of connections.

  1. I do not the the batched connection interface being used anywhere yet. Do you have or are planning to open a dedicated PR for this?

I can't get the point of this question, the batched connection interface is in the pion/transport repo, what do you mean a dedicated PR?

  1. It would be good if we can also add those new features to the standard UDPMux. Or is there a reasoning behind adding this only to the MultiUDPMux?

The UDPMux can have the batch feature simply by pass a PacketConn to its constructor NewUDPMuxDefault. And it should only listen on a single port so no need the ports balance option.

@stv0g
Copy link
Member

stv0g commented Aug 31, 2023

I can't get the point of this question, the batched connection interface is in the pion/transport repo, what do you mean a dedicated PR?

Sorry. I dont see where the batched connection interface is used in pion/ice. I assume we need another PR to actually use it? Or did I missed something?

The UDPMux can have the batch feature simply by pass a PacketConn to its constructor NewUDPMuxDefault. And it should only listen on a single port so no need the ports balance option.

Alright. Thanks for the explanation.

Improve performance of UDPMux by BatchIO and load
balance on ports
Merge master branch
Fix panic lint
}()

// Skip IPv6 test on i386
const ptrSize = 32 << (^uintptr(0) >> 63)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check against runtime.GOARCH here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are identical, and if one day there is a new GOARCH, the test will be failed.

udp_muxed_conn.go Outdated Show resolved Hide resolved
udp_mux.go Outdated Show resolved Hide resolved
udp_mux_multi.go Outdated Show resolved Hide resolved
udp_mux_multi.go Outdated Show resolved Hide resolved
udp_mux_multi.go Show resolved Hide resolved
cnderrauber and others added 3 commits September 1, 2023 14:27
Co-authored-by: Steffen Vogel <post@steffenvogel.de>
Co-authored-by: Steffen Vogel <post@steffenvogel.de>
Co-authored-by: Steffen Vogel <post@steffenvogel.de>
Co-authored-by: Steffen Vogel <post@steffenvogel.de>
@stv0g
Copy link
Member

stv0g commented Sep 1, 2023

Hi @cnderrauber,

(This is a reply to #608 (comment), but I think its a more general concern which deserves its own comment.)

I am sorry to raise objections about this PR. But I am afraid that this PR introduces even more complexity into the UDPMux logic. And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates.

I was under the impression that the PR attempt to introduce a batched connection interface to ice.Conn and allow passing batches of packets all the way to the mux.

I am not really that happy with hiding the packet batching/queuing inside the mux.
This mixing concerns here. The mux should mux packets over a single port.
It should not doing port balancing and queuing in my opinion.

I cleaner implementation would expose a batched connection interface to the user of pion/ice (in the form of ice.Conn) so that you can implement the queuing in your application logic.

Because of the implementation inside the UDP muxing logic, the port balancing and queuing is also limited to configurations which use the mux. How about configuration without a UDPMux? Or relayed candidates?

@cnderrauber
Copy link
Member Author

cnderrauber commented Sep 1, 2023

And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates.

That's true we already have three different types of UDPMux but actually they are all based on the UDPMuxDefault so they will all get benefits from the change. And I can't see why this will block other goals like relayed candidates.

I am not really that happy with hiding the packet batching/queuing inside the mux. This mixing concerns here. The mux should mux packets over a single port. It should not doing port balancing and queuing in my opinion.
I cleaner implementation would expose a batched connection interface to the user of pion/ice (in the form of ice.Conn) so that you can implement the queuing in your application logic.

Port balancing and Batch are two standalone options to improve the mux throughput that user can choose to enable anyone of them or both. A Mux only listens on a single port without batch will have a poor performance that can't be used in a large-scale production environment.
I don't think ice.Conn need the detail of batch as the batch only makes sense to the udp mux. Because in normal non-mux sessions, every single port only has limited traffic and packet rates that can't queue enough packet to write them in batch, otherwise, remote peer will get a very high jitter by the queueing.

Because of the implementation inside the UDP muxing logic, the port balancing and queuing is also limited to configurations which use the mux. How about configuration without a UDPMux? Or relayed candidates?

So port balance and batch should limit to the mux.

@davidzhao
Copy link
Member

davidzhao commented Sep 1, 2023

I am sorry to raise objections about this PR. But I am afraid that this PR introduces even more complexity into the UDPMux logic. And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates.

I was under the impression that the PR attempt to introduce a batched connection interface to ice.Conn and allow passing batches of packets all the way to the mux.

Don't be sorry! Appreciate you bringing up these points. It's worth discussing the tradeoffs that are being made. Let me share a bit more color here on the intent.

The goal of the PR is to improve performance of Pion when writing out to clients. In benchmarking, we discovered the bottleneck to be syscalls made during the writeTo path. In a typical case, the Pion-based server is sending packets simultaneously to multiple clients, so @cnderrauber came up with the proposal of batching packets intended for different clients together in the same syscall.

Benchmarks have shown that it's improved write performance by over 100%, compared to using standard pion/ice connections without batching. Also, by batching packets to multiple clients together, it helps to keep jitter under control, versus queuing sufficient packets before calling WriteBatch explicitly to each destination.

On the point of why it would use multiple ports. We've discovered additional bottlenecks writing out using a single UDP port. Locks within the write path would limit the number of cores we can effectively utilize. By allowing multiple ports to be used, we were able to get around some of these bottlenecks. Now, this is an option that isn't enabled by default. And for those that don't care about squeezing these bits of performance out of the system, the standard UDPMux is still available.

I hear you on the complexity it introduces in the code, though I hope what's in this PR is a reasonable set of tradeoffs that doesn't make the interface anymore difficult to use.

We are opening the PR here because we thought the work could benefit everyone using Pion. But we'd also respect your feedback if you feel that these changes should be kept in our own repo instead.

@stv0g
Copy link
Member

stv0g commented Sep 1, 2023

Hi @davidzhao,

Thanks for the detailed description of the motivation behind this PR. I fully agree with you on these points. Batching can really help here to increase the throughput. I would also love to profit from this feature in my own app.

There is also a very good article by Tailscale where they describe how they have been able to accelerate the WireGuard Go implementation to achieve over 10Gbps throughput in Userspace by using batching: https://tailscale.com/blog/more-throughput/

I am only arguing about the API which we use to realize it. I would love to have a batching API also for ice.Conn. But it seems like you mainly profit by the fact the you accumulate all your connections behind the mux. The just passing through batches from ice.Conn to the batched mux connection would not really help?

Have you thought about abandoning the use of muxes? Using dedicated connections per ICE agent could help you with the load balancing as you are dealing with a separate per connection?

Regarding the port balancing: Have you considered using SO_REUSE_PORT?
See:

@streamer45
Copy link

Apologies for the intrusion but port balancing on the same address/port through unix.SO_REUSEADDR is pretty much how we improved write throughput on our side as well. We implemented a multiConn wrapper implementing net.PacketConn that does some round robin on all available connections (e.g. one per CPU core). Batching sounds like a great option to add on top of that, looking forward to it.

@kcaffrey
Copy link
Contributor

kcaffrey commented Nov 2, 2023

We are also load balancing on a single port using SO_REUSEADDR, probably in a very similar way that @streamer45 is. The wrapper still implements net.PacketConn, so we can plug it into the existing UDPMux without modifying UDPMux itself.

Having batching as well would be nice, but I echo the earlier comments regarding the API. It would be nice if the batching wasn't coupled so tightly to the "multi udp mux" variant, but rather something that played nicely with the default UDP mux too.

@davidzhao
Copy link
Member

@kcaffrey @streamer45 any interest in adding port balancing into the standard UDPMux? It should be straight-forward to integrate with batching at that point.

It seems that the primary objection to the PR is the implication that batching will only be available to UDPMuxMulti and nothing else. That's not the case here. Most of the work had been put into transport/udp and can be leveraged by any of the muxes. To me it seems completely reasonable to have UDPMuxMulti to be the first consumer of this capability.

Am I misunderstanding the concerns?

@AshishKumar4
Copy link

AshishKumar4 commented Nov 28, 2023

Hello! Just checking on the progress of this PR. We our building a stack (consisting of an SFU) based on Pion, and have encountered similar performance bottlenecks on the writeTo pathway and were actually considering forking and implementing batching ourselves before stumbling on this PR. We would really appreciate to know about the progress here as indeed we expect huge performance gains from this. Meanwhile we will port this code and use it as the udp muxer.And thanks for implementing this awesome PR

@davidzhao
Copy link
Member

@AshishKumar4 the PR was ready to merge months ago, but several folks took issues with certain design decisions. We could merge as is, but in trying to be good citizens, we are holding off until @stv0g or others give the 👍. Until then, I'm afraid this PR is stuck here.

@streamer45
Copy link

@kcaffrey @streamer45 any interest in adding port balancing into the standard UDPMux? It should be straight-forward to integrate with batching at that point.

@davidzhao Apologies for the late reply. Yeah that should be doable and likely to be more performant than having to go through the net.PacketConn interface from the application side. We can definitely plan a contribution if there's no immediate urgency.

It seems that the primary objection to the PR is the implication that batching will only be available to UDPMuxMulti and nothing else. That's not the case here. Most of the work had been put into transport/udp and can be leveraged by any of the muxes. To me it seems completely reasonable to have UDPMuxMulti to be the first consumer of this capability.

Am I misunderstanding the concerns?

For what it's worth, as I am a mere bystander here, I am supportive of these changes. I was purely adding a data point above.

Overall I think it's inherently challenging to achieve high performance without introducing some complexity somewhere and leaving it all on the higher layers to deal with (e.g. app side) is not always feasible. So I find giving the option to enable/disable these sort of enhancements at this level to be a very reasonable compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants