Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: recon protocol hang with large diffs #356

Merged
merged 7 commits into from
May 20, 2024

Conversation

nathanielc
Copy link
Collaborator

If two nodes each had 2K+ events that the other node did not have it was possible for the protocol to deadlock as both nodes tried to write those events without read the values from the other node.

The fix was accomplished by splitting the protocol into two loops, a read loop and a write loop. They communicate with each other using message passing. The code now has both a clean separation of concerns as well as the behavior that we concurrently read and write from the network so we do not enter a state where we are exclusively writing to the network.

If two nodes each had 2K+ events that the other node did not have it was
possible for the protocol to deadlock as both nodes tried to write those
events without read the values from the other node.

The fix was accomplished by splitting the protocol into two loops, a
read loop and a write loop. They communicate with each other using
message passing. The code now has both a clean separation of concerns as
well as the behavior that we concurrently read and write from the
network so we do not enter a state where we are exclusively writing to
the network.
) -> Result<()> {
// TODO: This logic has two potential failure modes we need to test them
// 1. We allocate memory of all keys in the range, this can be very large.
// 2. We spend a lot of time writing out to the stream but not reading from the stream.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these points are addressed with this change.

Copy link
Contributor

@dav1do dav1do left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments / questions but I think this looks great!

recon/src/protocol.rs Outdated Show resolved Hide resolved
recon/src/protocol.rs Show resolved Hide resolved
recon/src/protocol.rs Show resolved Hide resolved
.await?;
for key in keys {
if let Some(value) = recon.value_for_key(key.clone()).await? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't a value should we be concerned at all now that we should discover them together or am I forgetting about something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, going to convert this into an error.

recon/src/protocol.rs Show resolved Hide resolved
@nathanielc nathanielc enabled auto-merge May 20, 2024 16:00
@nathanielc nathanielc added this pull request to the merge queue May 20, 2024
Merged via the queue into main with commit 3230a88 May 20, 2024
5 checks passed
@nathanielc nathanielc deleted the fix/recon-read-write-split branch May 20, 2024 16:25
@smrz2001 smrz2001 mentioned this pull request May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants