Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #598
This PR adds some additional robustness to websockets:
proposer
,challenger
andliquidity provider
all ping the websocket server (optimist
) to help prevent the websocket closing due to idleness. They log any timeouts.optimist
pings all connected websocket clients. If it doesn't receive a pong in time, it will terminate the websocket...optimist
will stop sending blocks for signature (it will go into a wait loop) until a new connection is established.The web3js websocket cannot be handled exactly in this way because it's used indirectly via web3js, but a ping is made every 15 seconds by querying the latest blocknumber, and the connection is already tested every 2s and a reconnect attempt made if it is closed.
Testing can be done by running the ping-pong test (no relation). During the test use
docker-compose pause proposer
to stop the proposer. After about 30soptimist
will notice that the websocket is closed and go into a wait loop (indicated in the logs) waiting forproposer
to reconnect. Unpausing the container (docker compose unpause proposer
) will allow proposer to realise the connection has dropped and it will re-establish the websocket, at which pointoptimist
will resume normal operation.Do not, however, expect the test to continue with no errors. It takes time for an application to realise the websocket has dropped, and during that time the state changes are not properly communicated to the other containers. That's ok - this mechanism is really meant to work when a connection is idle, not when it's running a test. If a websocket drops during a period of activity, it would normally mean there is a problem with a container. That's not in the scope of this PR.
Note: this PR also fixes a small bug that can cause block
leafCounts
to be computed incorrectly if several blocks are made quickly (localleafCounts
in theBlock
class did not increment correctly).