Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Reduce the Load Induced by the Polling Service #211

Closed
wants to merge 7 commits into from

Conversation

nepet
Copy link
Contributor

@nepet nepet commented Jul 20, 2023

We had some trouble on large nodes sending out all poll messages (see #185), maybe because we tried to send them out concurrently. This PR updates the polling service to use a sequential, time-ordered model to send out the poll messages.
Therefore we add a queue which elements are time ordered and by time we mean the timestamp at which we should send the next message to the peer.

Our main loop now is tick based and checks the following every other second:

  • Does the first peer in queue need a new poll msg sent out (is the timestamp past now)
  • If so -> send msg and re-queue with new timestamp
  • If not -> wait for next tick and check again

This way we reduce the stress that we put on the node and the network connections.
If this was the cause for the disappearing peers, this resolves #185

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
We use the queue to determine which peer is the next to be sent the
poll message.

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
tickDuration is now used to cycle through the poll queue.

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
We remove the potentially heavy-load functions that ask to send
out messages in parallel.

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
We listen for peers to connect or disconnect to the node running
peerswap and Add or Remove them from the poll service accordingly

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
This is mainly to make the tests more stable as we don't have
blocktime lag in regtest scenarios. But it does not hurt to have
the extra check in real life also.

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
@nepet nepet requested review from wtogami and grubles July 20, 2023 15:41
We only want to add the peer to the queue if we do not know about
it already.

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
@wtogami
Copy link
Contributor

wtogami commented Jul 20, 2023

#211 is running on Blockstream Store now. Please report if you continue to see it disappear from peerswap-listpeers. @grubles

For now don't run this yourself yet. Perhaps test this after 12 hours of success to see if it works both ways.

@wtogami
Copy link
Contributor

wtogami commented Jul 20, 2023

Testing: I want to see what happens to the timing of polls if networking is delayed by 10 seconds. In that case does 10 polls happen simultaneously or does it happen one at a time separated by at least 1 second between each poll?

@grubles
Copy link
Collaborator

grubles commented Jul 21, 2023

It finally disappeared about 5 minutes ago for me running 725ca2c. The patches at least seem to have extended the amount of time it takes for the node to disappear in some cases? In the past it would only take 3-4 hours to drop from peerswap-listpeers.

@wtogami
Copy link
Contributor

wtogami commented Jul 23, 2023

Summary of discussion

  • This reduces the chance of polls conflicting so it happens less often, but it still can disappear.
  • The issue is custommessage is fire-and-forget with no delivery guarantees. Random network delays (like Tor) can mean multiple custommessage happen at the same time and somehow none are delivered due to a conflict.
  • Warren thinks we didn't have the disappearing peer problem prior to Silence noisy poll log #189 because polling happened a lot more often. In that case it mattered less that some of them wouldn't be delivered.
  • Short-term Fix: Poll more often (with redundant logging still silenced). Keep the WIP - Reduce the Load Induced by the Polling Service #211 approach but perhaps use a longer delay like 3 seconds?

@wtogami
Copy link
Contributor

wtogami commented Jul 30, 2023

@grubles reported when he runs #211 on his node, Blockstream Store is not visible in peerswap-listpeers. It sounds like #211 as of d9d7d39 doesn't fix the original problem and somehow makes it worse when both sides are running it.

@wtogami
Copy link
Contributor

wtogami commented Jul 30, 2023

Blockstream Store experienced another curious problem where an incoming swap-in attempt got stuck in state State_SwapInReceiver_AwaitTxConfirmation. The requesting node successfully ended in State_ClaimedCsv so no funds were lost. Unclear if this weird state is caused by this issue but we haven't seen this problem prior to now.

For the moment I recommend not testing #211. It needs work.

@wtogami wtogami marked this pull request as draft July 30, 2023 13:25
@wtogami wtogami changed the title Reduce the Load Induced by the Polling Service WIP - Reduce the Load Induced by the Polling Service Jul 30, 2023
@wtogami
Copy link
Contributor

wtogami commented Aug 4, 2023

@wtogami wtogami closed this Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Peer disappears from peerswap-listpeers randomly
3 participants