-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance optimization when using multiple channels #411
Conversation
…the cycle over every channel.
I see the CI build is failing due to the C# version support (max 4.0). If necessary, I can make a commit ensuring that this pull-request will not require a higher C# version. Regards! |
The CI build is successful now! |
Looks like some good stuff, hopefully, it gets merged. |
In most cases there is no sense to have that much channels. So this limit here inst from random pick. Also increasing this limit will increase all packet sizes. |
Understandable about increasing the channel limit, yet I think queuing channels seems like a nice addition. |
Lol, now I'm actually gonna add something similar for my message queuing on my networking solution, only queue clients that need to be sent messages |
@Moe-Baker you can check my fork here https://github.com/AtomicTorchStudio/LiteNetLib/tree/Renkei I agree with @RevenantX opinion that the increased number of channels is not necessary and technically not viable to bring to the master branch. There are obvious tradeoffs (an increased packet header size) and 64 channels per delivery type is already a pretty significant number, and an increased number of channels is definitely not required for most games and applications. (In my case, the increased number of channels is necessary due to the nature of the game—it's a massive open-world game with a persistent world and each client is connected for many hours. In the worst case, on a single screen (server view scope for each player) there could be dozens and dozens of characters and structures (with network-replicated properties, e.g. health points, which is best to push in a separate channel). For example, see this video though it doesn't display UI (such as character healthbars) as we've recorded it in the spectator mode. I've researched other approaches such as snapshots that may work better and don't use an increased number of channels (utilizing more bandwidth instead when the packet is lost), but the current approach works just great. The server will temporarily allocate a channel only when its state is modified and the change needs to be pushed over the network; this way I'm reusing the channels but sometimes the number is going above 64 for a certain delivery method (like on the video above when there are many weapon shots and explosions damaging a lot of structures and characters)) I'm certain that my pull-request, introducing the optimization into the vanilla library, will definitely improve performance for almost all use cases, and will not hurt it significantly in the worst case. The overhead of pushing-pulling elements into the queue (even when an extra lock is required) is imperceptibly low and there are also no memory allocations. My conclusion is that without this optimization, with 1 ms tickrate the library will seriously struggle to process even 32 concurrent connections each with 32 created channels (32x32=1024 channel BTW, we're still running two test game servers—there were zero server reboots since the date of the pull request. Over 500 unique players joined, and thousands of hours played. No issues detected, the performance is stable, the memory usage is stable. The update will launch to an even larger audience (tens of thousands of unique players in a month) soon and I'm fairly certain with this optimization we will see the server handling 200+ concurrent connections without breaking a sweat! Regards! |
Hello!
I'm the developer of CryoFall—a multiplayer action survival game focused on persistent servers each capable of hosting up to 300 players online on an average VPS hardware (2 vCores 3.0+ GHz, 3 GB RAM, targeting fixed 40 updates per second). The game engine (called Renkei) is made from scratch and only using MonoGame as a client framework.
The engine was initially (since 2015) built to use Lidgren Network RUDP library (a default choice for .NET back then), but as I've found and reported more and more issues for it, and considering that this project is no longer actively maintained, I've been preparing to switch to LiteNetLib. Finally, last month, I've switched to LiteNetLib—and it's a great success, aside from one issue that this pull request is going to fix!
I'm the developer who has opened the tickets for LiteNetLib to request a feature of multiple channels per delivery method and to request implementing the reliable sequenced network delivery method 3.5 years ago, as these two were major missing features that were required for our game to function properly due to a large number of entities and fine-tuned network replication of their properties and RPCs. I'm glad to see that my feature requests were eventually implemented and that LiteNetLib has matured dramatically in the past years, making it perhaps a leading choice among the Unity users as well as developers writing their own engines!
Nothing can prove the maturity of LiteNetLib more, than the time necessary to switch from Lidgren Network to LiteNetLib in my case (the project is extremely large and using various Lidgren Network features extensively). The in-place library replacement was done within just a day, with a day more for extensive testing and polishing everything! And that's including the time I've spent making a fork to support more channels. The default max number of channels (64 per delivery method) is not always enough for us, but moving stuff to the application level is not always viable or reasonable, especially as the LiteNetLib already has well-written code to do this. By no means we're using that many channels all the time, but occasionally (with a large number of entities within the player's scope when each, for example, has an independent reliable sequenced channel for HP property and/or few other properties that are modified often) we need a little more than 64 channels for some delivery methods and my fork can handle this fine at the cost of slightly increased traffic overhead (extra one byte for the channel ID in the header). (Please note: current pull request doesn't introduce such change, I'm focusing strictly on the performance optimization when NetPeer has a significant number of channels; of course, the performance patch is even more profitable in case of the increased channel number)
Even when not pushing the number of channels to its limit of 64 per delivery method, I've noticed a heavily unoptimized code related to the usage of multiple channels. The problem is that
NetPeer
is keeping a linked list of all the channels, and everyUpdate
it's iterating over all of them callingSendNextPackets
for each. This creates a waste of performance that is getting out of hand quickly when the number of channels rises (O(N*M), N=active connections, M=total created channels per connection). Unfortunately, there are no performance tests included to measure this, but in production (we're running public experimental testing with two different machines hosting the game servers), I've quickly noticed that with LiteNetLib the test server performance degraded dramatically with most of the load being focused on the network thread—and that's just with around 20 online players! See the picture: htop CPU load before my patch (it was happening on both machines that we're running for public experimental testing).The good thing is that I've also had a similar issue with the Lidgren library before (it is also using a loop over all the channels for each peer in the heartbeat/update logic), and I've already implemented a patch for it 2.5 years ago in my fork of it, and it has proven reliable while being using in production all that time. So I was needed to implement the same solution for LiteNetLib based on my past experience. Hence the pull request. It proved a major success, with the CPU load of the network thread dropping more than x30 times while handling the same number of active connections, ensuring that we can handle hundreds of active connections with dozens of channels each (same as we did with Lidgren library before, or maybe even better). See the pictures: htop CPU load after the patch and the most revealing server CPU load statistics over the week. As you can see, the CPU load almost doesn't increase when the number of players increases to around 15-20 (the spikes are mostly related to the gameplay logic load increase now; e.g. the boss battles are calculation-expensive). And that's with a 1 ms logic update rate for
NetManager
. After the game update launch in January 2021, I will be able to produce performance stats on the server with hundreds of online players.So, what's the solution to this performance issue? Not iterating over all the channels of the
NetPeer
but only over those that have any packets to send (or re-send). It's a waste of performance to iterate over all of them, and there is simply no viable scenario in any game or application when such a large number of channels would be active all the time. In my code optimization, I'm introducing a queue of channels that are scheduled for calling theirSendNextPackets
method. The channel is enqueued (if it's not in the queue already) when it gets any packets to send, it's completely removed from the queue as soon as it has no packets to send (including ACK packets and possible re-sends). I've integrated this logic by apply the changes as small as possible, making several clear cuts. The code was pretty clear. Though in one case I have stumbled over a mutable struct—a follow-up commit fixed it and I've added a comment to ensure that nobody else would do the same mistake.Performance-wise, for queue I'm using a regular .NET
Queue<T>
which is implemented as a circular buffer internally and I'm allocating it with a large enough buffer already, so its performance is O(1) for bothDequeue
andEnqueue
methods. There is a very little overhead by enqueuing due to thelock
on the queue—these two operations happen rarely as there is a check whether the channel is already queued and this check doesn't require an extra lock in most cases as it often reusing the samelock (OutgoingQueue)
. Overall, the overhead should be neglectable even in the worst-case scenario (when there are only a few channels and they're used all the time).I hope this pull-request would be useful!
Regards!
Vladimir Kozlov aka ai_enabled
AtomicTorch Studio