-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Net] Enet optimizations and fixes for sending unreliable enet traffic larger than enets 1400 byte MTU #51466
Conversation
typedef struct { | ||
#ifdef WINDOWS_ENABLED | ||
size_t dataLength; | ||
void *data; | ||
#else | ||
void *data; | ||
size_t dataLength; | ||
#endif | ||
} NetSocketBuffer; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of note, on windows, the properties in these structs are inverted. (WSABUF https://docs.microsoft.com/en-us/windows/win32/api/ws2def/ns-ws2def-wsabuf)
I don't think we can't change this in
Do you have any data on how much this saves in terms of cpu usage? Adding a whole new function and struct to only use in the ENet base socket should be motivated by noticeable performance improvements.
This is interesting, I'll have a look, but this change seems fine. |
@@ -217,21 +217,22 @@ void NetworkedMultiplayerENet::poll() { | |||
|
|||
_pop_current_packet(); | |||
|
|||
if (!host || !active) // Might be disconnected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
s should always include curly brackets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
/* Keep servicing until there are no available events left in the queue. */ | ||
do { | ||
if (!host || !active) // Check again after every event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
s should always include curly brackets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I would argue against this, as it's currently broken for everyone in 3.x. Godot exposes three transfer modes It shouldn't be possible to encounter application-side buffer bloat on an unreliable transport because if network congestion is encountered eventually somewhere along the way packets will simply just get dropped. However, unlimited outbound buffer bloat is exactly what we encountered as shown in the above output I posted. The fact that godot is currently converting unreliable packets over 1400 bytes to reliable, unknowingly to the developer, is not good. For any 3.x developer that is unfamiliar with enet internals this will be very difficult to identify and debug. If developers are using
For us, we're trying to make godot performant, as ultimately we want to support as many simultaneous client connections as possible. So 'noticeable performance improvements' are going to be very use-case specific. Our gameservers are hosted on the cloud, were we pay for cpu time, so any performance improvements mean less costs which is a win for us. Likewise we want to minimize as much cpu overhead because currently we are cpu bound and that determines how many concurrent client connections we can support in our battle-royale games. When you have hundreds of packets being sent to hundreds or potentially a thousand connected clients, superfluous memcpy's for every outgoing packet start to add up. The data to send is already available in buffers, minimizing memcpy's and memory operation overhead in netcode should be a goal to strive for regardless. Part of the motivation behind the additional support for |
…er than Enets 1400 byte MTU
d1b0f45
to
e336b2a
Compare
It just worries me that you mention it being untested on windows, I don't want to merge something that theoretically improve ENet and in practise could break it on windows. So, it will have to undergo proper testing. So, I would like to split this PR, have the |
We should probably have a networking tests project (similar to the physics tests demo project) that tests this kind of situation, such as sending packets very often, sending large packets, etc. |
Ya I mean, I'm hoping someone, (I guess I could try), can test windows before this is merged. |
I'll try to test this on Windows soon-ish. |
In any case this should be merged to |
Thanks for this and your work!
|
The main reason behind this ENet default is the fact that with every additional fragment the drop probability will increase exponentially. In general, you want to avoid unreliable packets above the MTU exactly for this reason, so a warning in debug as suggested could be really useful.
Yeah, I think that's a good idea, and the 3.x default should be compatible with what we had before IMHO, but at least users can change it. We could use the other default in |
Yeah, if we make it configurable and add a warning IMO the current default is fine for Godot 4.0 too. |
We don't want to really adjust the ENet MTU. It should be configured to be something smaller than 1500 bytes which is a common MTU on a bunch of different link level protocols. You really want to avoid fragmentation at the link and lower layers, which is why ENET settled on a 1400 MTU for it's library (which is a good choice, and allows some additional header room).
Well this is a weird thought, because ya, it's going to be 'exponentially' more drops because currently there are ZERO drops ( it's currently switched to reliable ). If And if you are currently experiencing anything greater than 5% packet loss, you are going to have a lot more issues other than what's happening here... And a 5% packet loss or greater for 'unreliable' egress, is generally caused by the sender sending more data than what the connection can handle. Loss here, is a good thing. In the end, you are trading off one problem for another here. If you try to push more unreliable data than a connection can handle, you just end up with unlimited outbound buffer bloat (current status quo in godot implementation). With an unreliable transport, you shouldn't have to worry about outbound buffer bloat, it's kinda one of the perks. If someone is trying to send a 1MB packet unreliably, then ya the drop probability will increase exponentially, but at this point it's really user-error. And it becomes quite obvious during development if you really need this 1MB packet, you should just send it reliably. Out of all the various network properties, real-time game networking is the most sensitive to latencies and jitter. Having unreliable packets, potentially queued up indefinitely at the application layer is probably the worst thing that can happen, and in fact much worse than dropping packets. Because ANY queuing/timers/re-transmissions of these packets is going to wildly change the jitter and latency parameters on the receiving end. I can't say this more, but packets dropping can and should be expected when you explicitly ask to send something unreliably. In fact it's a good thing, because if you have network congestion, simply dropping packets eliminates additional latencies introduced by buffering. I have heard other devs encounter strange behaviour and issues with godot enet connections, and I wouldn't be surprised if some developers are currently getting caught by this. It's incredibly difficult to diagnose and debug, as it just manifests like network latency hiccups (but they are actually application introduced latencies), which unless you are a network engineer will be tough to diagnose and fix. And the runaway outbound buffer can potentially bite developers big time, because again it will be hard for general developers to diagnose and understand what's currently going on. I'm not sure what the solution is for godot devs in 3.X is, but my guess is we aren't the only studio to encounter this problem. Sending game state updates over 1400 bytes isn't uncommon at all... At a 30hz broadcast rate, 1400 bytes a state update is only ~ 40kB/s. I mean, developers COULD break up their state data into packets of 1400 bytes or less, but honestly whats the point? The whole game state still needs to be re-assembled by the client to be valid, in which case you've just shifted the point of re-assembly from the transport/enet layer into the application layer. It's totally pointless to work around this limitation at the application level. My only concern with maintaining the status quo ( not making |
@jordo Thanks for taking the time to discuss this in depth and for the writeup. I think you convinced me, I can follow and see your points. Need to mull this over a bit, but you're probably right. I think making MTU configurable still doesn't hurt (so devs can lower it if needed), and I still think a warning would be nice (because it is hard to gauge/debug), but even if we make the behaviour if it is exceeded configurable, |
Well, according to maths, that's still exponential, that's just the second power, which is exponential. |
The MTU of the link level is usually 1500. Enet has specified 1400 as it's MTU. I think that enet's MTU is optimal. Setting the MTU higher than 1500 is going to cause drops and fragments on the link level, and setting it less than 1500 is going to cause less information density to travel over the network. Enet has specified 1400 using a 100 byte buffer under 1500. I think that's a good, battle-tested decision by enet. TL;DR Leave enet's MTU at 1400 |
Well, the |
This PR includes some of the enet optimizations and fixes we have been using on our private godot fork.
This PR address the following:
ENET_PACKET_FLAG_UNRELIABLE_FRAGMENT
when sending data on the unreliable enet transport. Enet's internal MTU is configured to be 1400 bytes. If you do not set this flag, Enet will deliver messages that are larger than 1400 bytes RELIABLY. The following is output from a gameserver withENET_DEBUG
defined in enet'sprotocol.c
source. This was really surprising to us to see enet queue up outgoing packets even though our godot application was sending everything unreliably. Unless theENET_PACKET_FLAG_UNRELIABLE_FRAGMENT
flag is set (https://github.com/lsalzman/enet/blob/74cea7abf52ddd355146aeb0a4077d2b95368122/include/enet/enet.h#L115), Enet will deliver unreliable messages over 1400 bytes reliably as enet by default will attempt to re-assemble fragments.The current implementation of
enet_socket_send
in godot (godot.cpp
), makes a surperflous copy of outbound data before sending it out on the socket. https://github.com/godotengine/godot/pull/7985/files#diff-4b4db5dcc81b5bff9d43a2bf9b6193316d1b822fe8a310f34b324f0a9e451d3dR143-R148 This is due mainly to the fact that enet wants to to send data viasendmsg
on posix platforms, andWSASendTo
on windows (inwin23.c
andunix.c
within the enet library). Both these platform functions allow an application to send a list of packets ( A packet being a data pointer and length).Netsocket
does not expose a similar interface, so a copy has to be made from each of outgoing packets withinenet_socket_send
. This PR addssendmsg
type support toNetsocket
which allows godot to send a list of packets implemented withsendmsg
andWSASendTo
on supported godot platforms. Supporting this api inNetsocket
allows this PR to remove the superfluous memcopy inenet_socket_send
ingodot.cpp
.Lastly, an improvement is made to how enet is serviced in
NetworkedMultiplayerENet::poll
. The current implementation is to loop onenet_host_service()
. This method of servicing enet has a potential starvation issue that we encountered, which can lead to unexpected durations of how long the godot main thread is spending inNetworkedMultiplayerENet::poll
. This is especially not wanted, if you are polling enet on the main thread of a game engine (there is no guarantee the loop will ever stop). If you have clients that send data fast enough, enet will continue to service these messages, and it can be tough to break out of the loop and lead to starvation issues. A solution by the enet library developer was to addenet_host_check_events()
. This can be used to process all the events queued up for processing by a single call toenet_host_service()
. This is somewhat described in this response here: Q: Right usage of host service and check events lsalzman/enet#126 (comment) A single call toenet_host_service()
will hit the socket, and queue all incoming messages for processing in enet. Repeatedly callingenet_host_check_events()
doesn't touch the socket at all, and will process all queued messages until there are none left. This technique is described the enet library's developer here: https://www.mail-archive.com/enet-discuss@cubik.org/msg00947.html and http://lists.cubik.org/pipermail/enet-discuss/2013-September/002238.html.Of note - I have NOT been able to test
NetSocketPosix::sendmsg
on windows.NetSocketPosix::sendmsg
does have an#ifdef
for the windows platform here but I believe it's a good tradeoff here to support this api, as it allows a lot of efficiencies especially when specifically working with the enet api. The spec forWSASendTo
seems reasonably straight forward (https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsasendto) however I currently don't have a machine to test this path.This change can be cherry picked to 4.0 master as well.