fix: recon protocol hang with large diffs #356

nathanielc · 2024-05-16T15:23:07Z

If two nodes each had 2K+ events that the other node did not have it was possible for the protocol to deadlock as both nodes tried to write those events without read the values from the other node.

The fix was accomplished by splitting the protocol into two loops, a read loop and a write loop. They communicate with each other using message passing. The code now has both a clean separation of concerns as well as the behavior that we concurrently read and write from the network so we do not enter a state where we are exclusively writing to the network.

If two nodes each had 2K+ events that the other node did not have it was possible for the protocol to deadlock as both nodes tried to write those events without read the values from the other node. The fix was accomplished by splitting the protocol into two loops, a read loop and a write loop. They communicate with each other using message passing. The code now has both a clean separation of concerns as well as the behavior that we concurrently read and write from the network so we do not enter a state where we are exclusively writing to the network.

nathanielc · 2024-05-17T19:53:16Z

recon/src/protocol.rs

-    ) -> Result<()> {
-        // TODO: This logic has two potential failure modes we need to test them
-        // 1. We allocate memory of all keys in the range, this can be very large.
-        // 2. We spend a lot of time writing out to the stream but not reading from the stream.


Both of these points are addressed with this change.

dav1do

A few comments / questions but I think this looks great!

recon/src/protocol.rs

dav1do · 2024-05-17T22:37:05Z

recon/src/protocol.rs

                    .await?;
+                for key in keys {
+                    if let Some(value) = recon.value_for_key(key.clone()).await? {


If there isn't a value should we be concerned at all now that we should discover them together or am I forgetting about something?

Good call, going to convert this into an error.

recon/src/protocol.rs

nathanielc temporarily deployed to github-tests-2024 May 16, 2024 15:33 — with GitHub Actions Inactive

fix: clippy

318b715

nathanielc temporarily deployed to github-tests-2024 May 16, 2024 16:45 — with GitHub Actions Inactive

nathanielc requested review from stbrody and dav1do May 17, 2024 19:49

fix: correctly handle conversation shutdown

e385e7f

nathanielc commented May 17, 2024

View reviewed changes

chore: fixing a few small nits

6a1c02a

nathanielc temporarily deployed to github-tests-2024 May 17, 2024 20:03 — with GitHub Actions Inactive

fix: update model error test with full set of events

e3dfcbc

nathanielc temporarily deployed to github-tests-2024 May 17, 2024 20:58 — with GitHub Actions Inactive

dav1do approved these changes May 17, 2024

View reviewed changes

fix: add context to error messages

be7cf1b

nathanielc temporarily deployed to github-tests-2024 May 20, 2024 15:36 — with GitHub Actions Inactive

fix: return error for missing value

7c7ca72

nathanielc enabled auto-merge May 20, 2024 16:00

nathanielc temporarily deployed to github-tests-2024 May 20, 2024 16:08 — with GitHub Actions Inactive

nathanielc added this pull request to the merge queue May 20, 2024

Merged via the queue into main with commit 3230a88 May 20, 2024
5 checks passed

nathanielc deleted the fix/recon-read-write-split branch May 20, 2024 16:25

smrz2001 mentioned this pull request May 20, 2024

chore: version v0.19.0 #361

Merged

dav1do mentioned this pull request May 23, 2024

fix: events incorrectly left undelivered #366

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: recon protocol hang with large diffs #356

fix: recon protocol hang with large diffs #356

nathanielc commented May 16, 2024

nathanielc May 17, 2024

dav1do left a comment

dav1do May 17, 2024

nathanielc May 20, 2024

fix: recon protocol hang with large diffs #356

fix: recon protocol hang with large diffs #356

Conversation

nathanielc commented May 16, 2024

nathanielc May 17, 2024

Choose a reason for hiding this comment

dav1do left a comment

Choose a reason for hiding this comment

dav1do May 17, 2024

Choose a reason for hiding this comment

nathanielc May 20, 2024

Choose a reason for hiding this comment