Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Support concurrent getdata requests. #3203

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Oct 28, 2023

This requires #3201 and is rebased on #3202.

Currently, each peer only supports one getdata request at a time and the inbound handler is blocked until all of the requested data is served. This means messages that would otherwise be fast to process, such as pings, can potentially be delayed for long periods of time which is clearly not ideal.

Moreover, the existing behavior means that it is theoretically possible for a pair of peers to experience a situation where neither one can make progress because they're both waiting on each other to serve each other data which then would ultimately result in hitting idle timeouts and disconnection.

In practice, the aforementioned live lock situation basically never happens currently due to a variety of other factors such as duplicate request filtering and not notifying peers about inventory they are known to have. However, the combination of factors that prevent the situation will not necessarily apply to newer data types such as those for mesh-based mixing.

The aforementioned blocking behavior is a holdover from very early network code that relied on it once upon a time, but is no longer relevant.

Thus, to improve overall throughput and address the root of the aforementioned concerns, this changes the semantics to serve getdata requests asynchronously so other inbound messages can be processed concurrently. This means the remote peer is then free to send other messages while waiting for the requested data to be served.

Next, limits to the amount of concurrent getdata requests and the total maximum number of pending individual data item requests are now imposed in order to prevent peers from be able to consume copious amounts of memory and other malicious behavior this change would otherwise allow.

The limits are enforced such that peers may now mix and match simultaneous getdata requests for varying amounts of data items so long as they do not exceed the maximum number of allowed simultaneous pending getdata messages or the maximum number of total overall pending individual data item requests.

This approach of applying the limits on both dimensions offers more flexibility and the potential for future efficiency gains while still keeping memory usage bounded to reasonable limits and protecting against other malicious behavior.

@davecgh davecgh added the wire protocol change Discussion and pull requests regarding items that require changes to the wire protocol. label Oct 28, 2023
@davecgh davecgh added this to the 1.9.0 milestone Oct 28, 2023
@davecgh davecgh force-pushed the server_support_concurrent_getdata branch from a501d6a to 40c8ddc Compare October 28, 2023 19:39
Comment on lines +1374 to +1375
numPendingGetDataReqs := len(sp.getDataQueue)
if numPendingGetDataReqs+1 > maxConcurrentGetDataReqs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only note is that strictly speaking, this is racy because it could be the case that the check fails just as an item is popped off getDataQueue, but the affordance on the buffer sizes are large enough that this shouldn't be an issue.

Also, this function (OnGetData) is still on the same goroutine as the peer's inHandler, so the len check and the channel send below, while not strictly atomic, are "directionally" safe (which wouldn't necessarily be true, so that's why I'm mentioning for the record).

The final code for handling an individual getData request seems simpler now as well (as opposed to those pushXX funcs).

Copy link
Member Author

@davecgh davecgh Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

I originally was using a separate queue with a mutex and sync.Cond to signal updates where the mutex protected the length checks and addition of items to the queue itself, but I decided to go with this channel-based approach instead because the code is much easier to read and, as you mentioned, it won't be an issue because I intentionally set the buffer sizes large enough. They way far exceed anything a peer not trying to do malicious things will use in practice.

Moreover, in practice, the only way the limits would be hit without the peer being banned by the logic prior to this is if the peer is intentionally misbehaving and should be disconnected in that case anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the race could be avoided with a non-blocking write to the channel. if this falls into the default case, then the queue was full.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point since it's buffered. I was originally worried about it blocking while it was busy waiting on the semaphore to send, but due to the buffer, that would only happen when the queue is full.

That said, the same situation would still apply to the overall number of items though, so I'm not sure it's worth changing it given I've already heavily tested the current logic.

@davecgh davecgh force-pushed the server_support_concurrent_getdata branch from 40c8ddc to c975a84 Compare October 30, 2023 15:09
Currently, each peer only supports one getdata request at a time and the
inbound handler is blocked until all of the requested data is served.
This means messages that would otherwise be fast to process, such as
pings, can potentially be delayed for long periods of time which is
clearly not ideal.

Moreover, the existing behavior means that it is theoretically possible
for a pair of peers to experience a situation where neither one can make
progress because they're both waiting on each other to serve each other
data which then would ultimately result in hitting idle timeouts and
disconnection.

In practice, the aforementioned live lock situation basically never
happens currently due to a variety of other factors such as duplicate
request filtering and not notifying peers about inventory they are known
to have.  However, the combination of factors that prevent the situation
will not necessarily apply to newer data types such as those for
mesh-based mixing.

The aforementioned blocking behavior is a holdover from very early
network code that relied on it once upon a time, but is no longer
relevant.

Thus, to improve overall throughput and address the root of the
aforementioned concerns, this changes the semantics to serve getdata
requests asynchronously so other inbound messages can be processed
concurrently.  This means the remote peer is then free to send other
messages while waiting for the requested data to be served.

Next, limits to the amount of concurrent getdata requests and the total
maximum number of pending individual data item requests are now imposed
in order to prevent peers from be able to consume copious amounts of
memory and other malicious behavior this change would otherwise allow.

The limits are enforced such that peers may now mix and match
simultaneous getdata requests for varying amounts of data items so long
as they do not exceed the maximum number of allowed simultaneous pending
getdata messages or the maximum number of total overall pending
individual data item requests.

This approach of applying the limits on both dimensions offers more
flexibility and the potential for future efficiency gains while still
keeping memory usage bounded to reasonable limits and protecting against
other malicious behavior.
@davecgh davecgh force-pushed the server_support_concurrent_getdata branch from c975a84 to 3aaaf3f Compare October 30, 2023 16:16
@davecgh davecgh merged commit 3aaaf3f into decred:master Oct 30, 2023
2 checks passed
@davecgh davecgh deleted the server_support_concurrent_getdata branch October 30, 2023 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wire protocol change Discussion and pull requests regarding items that require changes to the wire protocol.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants