Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Westlad/proposer robustify #600

Merged
merged 8 commits into from
Apr 21, 2022
Merged

Westlad/proposer robustify #600

merged 8 commits into from
Apr 21, 2022

Conversation

Westlad
Copy link
Contributor

@Westlad Westlad commented Mar 30, 2022

fixes #598

This PR adds some additional robustness to websockets:

  1. proposer, challenger and liquidity provider all ping the websocket server (optimist) to help prevent the websocket closing due to idleness. They log any timeouts.
  2. optimist pings all connected websocket clients. If it doesn't receive a pong in time, it will terminate the websocket...
  3. ...this will cause the client to undergo a re-connect cycle to attempt to re-establish a new connection.
  4. optimist will stop sending blocks for signature (it will go into a wait loop) until a new connection is established.

The web3js websocket cannot be handled exactly in this way because it's used indirectly via web3js, but a ping is made every 15 seconds by querying the latest blocknumber, and the connection is already tested every 2s and a reconnect attempt made if it is closed.

Testing can be done by running the ping-pong test (no relation). During the test use docker-compose pause proposer to stop the proposer. After about 30s optimist will notice that the websocket is closed and go into a wait loop (indicated in the logs) waiting for proposer to reconnect. Unpausing the container (docker compose unpause proposer) will allow proposer to realise the connection has dropped and it will re-establish the websocket, at which point optimist will resume normal operation.

Do not, however, expect the test to continue with no errors. It takes time for an application to realise the websocket has dropped, and during that time the state changes are not properly communicated to the other containers. That's ok - this mechanism is really meant to work when a connection is idle, not when it's running a test. If a websocket drops during a period of activity, it would normally mean there is a problem with a container. That's not in the scope of this PR.

Note: this PR also fixes a small bug that can cause block leafCounts to be computed incorrectly if several blocks are made quickly (local leafCounts in the Block class did not increment correctly).

@Westlad Westlad self-assigned this Mar 30, 2022
@Westlad Westlad temporarily deployed to AWS March 30, 2022 22:18 Inactive
@Westlad Westlad temporarily deployed to AWS March 31, 2022 12:59 Inactive
@Westlad Westlad temporarily deployed to AWS March 31, 2022 15:02 Inactive
@Westlad Westlad temporarily deployed to AWS April 1, 2022 13:58 Inactive
@Westlad Westlad added the One more approval needed One reviewer has approved this PR but another is needed label Apr 4, 2022
@Westlad Westlad force-pushed the westlad/proposer-robustify branch from f41b4f0 to a92ecf3 Compare April 6, 2022 14:42
@Westlad Westlad force-pushed the westlad/proposer-robustify branch from a92ecf3 to dd348cd Compare April 8, 2022 12:43
@druiz0992 druiz0992 merged commit 93e07d2 into master Apr 21, 2022
@druiz0992 druiz0992 deleted the westlad/proposer-robustify branch April 21, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
One more approval needed One reviewer has approved this PR but another is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Robustify proposer application
3 participants