server: Support concurrent getdata requests. #3203

davecgh · 2023-10-28T19:29:06Z

This requires #3201 and is rebased on #3202.

Currently, each peer only supports one getdata request at a time and the inbound handler is blocked until all of the requested data is served. This means messages that would otherwise be fast to process, such as pings, can potentially be delayed for long periods of time which is clearly not ideal.

Moreover, the existing behavior means that it is theoretically possible for a pair of peers to experience a situation where neither one can make progress because they're both waiting on each other to serve each other data which then would ultimately result in hitting idle timeouts and disconnection.

In practice, the aforementioned live lock situation basically never happens currently due to a variety of other factors such as duplicate request filtering and not notifying peers about inventory they are known to have. However, the combination of factors that prevent the situation will not necessarily apply to newer data types such as those for mesh-based mixing.

The aforementioned blocking behavior is a holdover from very early network code that relied on it once upon a time, but is no longer relevant.

Thus, to improve overall throughput and address the root of the aforementioned concerns, this changes the semantics to serve getdata requests asynchronously so other inbound messages can be processed concurrently. This means the remote peer is then free to send other messages while waiting for the requested data to be served.

Next, limits to the amount of concurrent getdata requests and the total maximum number of pending individual data item requests are now imposed in order to prevent peers from be able to consume copious amounts of memory and other malicious behavior this change would otherwise allow.

The limits are enforced such that peers may now mix and match simultaneous getdata requests for varying amounts of data items so long as they do not exceed the maximum number of allowed simultaneous pending getdata messages or the maximum number of total overall pending individual data item requests.

This approach of applying the limits on both dimensions offers more flexibility and the potential for future efficiency gains while still keeping memory usage bounded to reasonable limits and protecting against other malicious behavior.

matheusd · 2023-10-30T12:15:18Z

server.go

+	numPendingGetDataReqs := len(sp.getDataQueue)
+	if numPendingGetDataReqs+1 > maxConcurrentGetDataReqs {


My only note is that strictly speaking, this is racy because it could be the case that the check fails just as an item is popped off getDataQueue, but the affordance on the buffer sizes are large enough that this shouldn't be an issue.

Also, this function (OnGetData) is still on the same goroutine as the peer's inHandler, so the len check and the channel send below, while not strictly atomic, are "directionally" safe (which wouldn't necessarily be true, so that's why I'm mentioning for the record).

The final code for handling an individual getData request seems simpler now as well (as opposed to those pushXX funcs).

Correct.

I originally was using a separate queue with a mutex and sync.Cond to signal updates where the mutex protected the length checks and addition of items to the queue itself, but I decided to go with this channel-based approach instead because the code is much easier to read and, as you mentioned, it won't be an issue because I intentionally set the buffer sizes large enough. They way far exceed anything a peer not trying to do malicious things will use in practice.

Moreover, in practice, the only way the limits would be hit without the peer being banned by the logic prior to this is if the peer is intentionally misbehaving and should be disconnected in that case anyway.

the race could be avoided with a non-blocking write to the channel. if this falls into the default case, then the queue was full.

Fair point since it's buffered. I was originally worried about it blocking while it was busy waiting on the semaphore to send, but due to the buffer, that would only happen when the queue is full.

That said, the same situation would still apply to the overall number of items though, so I'm not sure it's worth changing it given I've already heavily tested the current logic.

Currently, each peer only supports one getdata request at a time and the inbound handler is blocked until all of the requested data is served. This means messages that would otherwise be fast to process, such as pings, can potentially be delayed for long periods of time which is clearly not ideal. Moreover, the existing behavior means that it is theoretically possible for a pair of peers to experience a situation where neither one can make progress because they're both waiting on each other to serve each other data which then would ultimately result in hitting idle timeouts and disconnection. In practice, the aforementioned live lock situation basically never happens currently due to a variety of other factors such as duplicate request filtering and not notifying peers about inventory they are known to have. However, the combination of factors that prevent the situation will not necessarily apply to newer data types such as those for mesh-based mixing. The aforementioned blocking behavior is a holdover from very early network code that relied on it once upon a time, but is no longer relevant. Thus, to improve overall throughput and address the root of the aforementioned concerns, this changes the semantics to serve getdata requests asynchronously so other inbound messages can be processed concurrently. This means the remote peer is then free to send other messages while waiting for the requested data to be served. Next, limits to the amount of concurrent getdata requests and the total maximum number of pending individual data item requests are now imposed in order to prevent peers from be able to consume copious amounts of memory and other malicious behavior this change would otherwise allow. The limits are enforced such that peers may now mix and match simultaneous getdata requests for varying amounts of data items so long as they do not exceed the maximum number of allowed simultaneous pending getdata messages or the maximum number of total overall pending individual data item requests. This approach of applying the limits on both dimensions offers more flexibility and the potential for future efficiency gains while still keeping memory usage bounded to reasonable limits and protecting against other malicious behavior.

davecgh added the wire protocol change Discussion and pull requests regarding items that require changes to the wire protocol. label Oct 28, 2023

davecgh added this to the 1.9.0 milestone Oct 28, 2023

davecgh force-pushed the server_support_concurrent_getdata branch from a501d6a to 40c8ddc Compare October 28, 2023 19:39

matheusd approved these changes Oct 30, 2023

View reviewed changes

davecgh force-pushed the server_support_concurrent_getdata branch from 40c8ddc to c975a84 Compare October 30, 2023 15:09

davecgh force-pushed the server_support_concurrent_getdata branch from c975a84 to 3aaaf3f Compare October 30, 2023 16:16

jrick approved these changes Oct 30, 2023

View reviewed changes

davecgh merged commit 3aaaf3f into decred:master Oct 30, 2023
2 checks passed

davecgh deleted the server_support_concurrent_getdata branch October 30, 2023 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: Support concurrent getdata requests. #3203

server: Support concurrent getdata requests. #3203

davecgh commented Oct 28, 2023 •

edited

Loading

matheusd Oct 30, 2023

davecgh Oct 30, 2023 •

edited

Loading

jrick Oct 30, 2023

davecgh Oct 30, 2023

		numPendingGetDataReqs := len(sp.getDataQueue)
		if numPendingGetDataReqs+1 > maxConcurrentGetDataReqs {

server: Support concurrent getdata requests. #3203

server: Support concurrent getdata requests. #3203

Conversation

davecgh commented Oct 28, 2023 • edited Loading

matheusd Oct 30, 2023

Choose a reason for hiding this comment

davecgh Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

jrick Oct 30, 2023

Choose a reason for hiding this comment

davecgh Oct 30, 2023

Choose a reason for hiding this comment

davecgh commented Oct 28, 2023 •

edited

Loading

davecgh Oct 30, 2023 •

edited

Loading