Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process PeerViewChange messages with priority #577

Closed
Tracked by #26
sandreim opened this issue Aug 17, 2023 · 0 comments · Fixed by #4755
Closed
Tracked by #26

Process PeerViewChange messages with priority #577

sandreim opened this issue Aug 17, 2023 · 0 comments · Fixed by #4755
Assignees
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@sandreim
Copy link
Contributor

This will need some flavor of paritytech/orchestra#8 implemented first.

Prioritising the processing of PeerViewChange in networking subsystems will help reducing wasteful/useless work and faster propagation of gossip messages - approvals/assignments, bitfields and statements.

We know that under high load, these messages usually have a very high ToF (up to 5s) and get processed very late by the approval distribution subsystem. With this change the peer view update is processed ahead of all other messages except signals.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T8-parachains_engineering and removed I10-optimisation labels Aug 25, 2023
@AndreiEres AndreiEres self-assigned this May 29, 2024
@AndreiEres AndreiEres moved this from Backlog to In Progress in parachains team board May 29, 2024
@AndreiEres AndreiEres moved this from In Progress to Review in progress in parachains team board Jun 18, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 16, 2024
Closes #577

### Changed
- `orchestra` updated to 0.4.0
- `PeerViewChange` sent with high priority and should be processed first
in a queue.
- To count them in tests added tracker to TestSender and TestOverseer.
It acts more like a smoke test though.

### Testing on Versi

The changes were tested on Versi with two objectives:
1. Make sure the node functionality does not change.
2. See how the changes affect performance.

Test setup:
- 2.5 hours for each case
- 100 validators
- 50 parachains
- validatorsPerCore = 2
- neededApprovals = 100
- nDelayTranches = 89
- relayVrfModuloSamples = 50

During the test period, all nodes ran without any crashes, which
satisfies the first objective.

To estimate the change in performance we used ToF charts. The graphs
show that there are no spikes in the top as before. This proves that our
hypothesis is correct.

### Normalized charts with ToF

![image](https://github.com/user-attachments/assets/0d49d0db-8302-4a8c-a557-501856805ff5)
[Before](https://grafana.teleport.parity.io/goto/ZoR53ClSg?orgId=1)


![image](https://github.com/user-attachments/assets/9cc73784-7e45-49d9-8212-152373c05880)
[After](https://grafana.teleport.parity.io/goto/6ux5qC_IR?orgId=1)

### Conclusion

The prioritization of subsystem messages reduces the ToF of the
networking subsystem, which helps faster propagation of gossip messages.
@github-project-automation github-project-automation bot moved this from Review in progress to Completed in parachains team board Jul 16, 2024
jpserrat pushed a commit to jpserrat/polkadot-sdk that referenced this issue Jul 18, 2024
Closes paritytech#577

### Changed
- `orchestra` updated to 0.4.0
- `PeerViewChange` sent with high priority and should be processed first
in a queue.
- To count them in tests added tracker to TestSender and TestOverseer.
It acts more like a smoke test though.

### Testing on Versi

The changes were tested on Versi with two objectives:
1. Make sure the node functionality does not change.
2. See how the changes affect performance.

Test setup:
- 2.5 hours for each case
- 100 validators
- 50 parachains
- validatorsPerCore = 2
- neededApprovals = 100
- nDelayTranches = 89
- relayVrfModuloSamples = 50

During the test period, all nodes ran without any crashes, which
satisfies the first objective.

To estimate the change in performance we used ToF charts. The graphs
show that there are no spikes in the top as before. This proves that our
hypothesis is correct.

### Normalized charts with ToF

![image](https://github.com/user-attachments/assets/0d49d0db-8302-4a8c-a557-501856805ff5)
[Before](https://grafana.teleport.parity.io/goto/ZoR53ClSg?orgId=1)


![image](https://github.com/user-attachments/assets/9cc73784-7e45-49d9-8212-152373c05880)
[After](https://grafana.teleport.parity.io/goto/6ux5qC_IR?orgId=1)

### Conclusion

The prioritization of subsystem messages reduces the ToF of the
networking subsystem, which helps faster propagation of gossip messages.
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this issue Aug 2, 2024
Closes paritytech#577

### Changed
- `orchestra` updated to 0.4.0
- `PeerViewChange` sent with high priority and should be processed first
in a queue.
- To count them in tests added tracker to TestSender and TestOverseer.
It acts more like a smoke test though.

### Testing on Versi

The changes were tested on Versi with two objectives:
1. Make sure the node functionality does not change.
2. See how the changes affect performance.

Test setup:
- 2.5 hours for each case
- 100 validators
- 50 parachains
- validatorsPerCore = 2
- neededApprovals = 100
- nDelayTranches = 89
- relayVrfModuloSamples = 50

During the test period, all nodes ran without any crashes, which
satisfies the first objective.

To estimate the change in performance we used ToF charts. The graphs
show that there are no spikes in the top as before. This proves that our
hypothesis is correct.

### Normalized charts with ToF

![image](https://github.com/user-attachments/assets/0d49d0db-8302-4a8c-a557-501856805ff5)
[Before](https://grafana.teleport.parity.io/goto/ZoR53ClSg?orgId=1)


![image](https://github.com/user-attachments/assets/9cc73784-7e45-49d9-8212-152373c05880)
[After](https://grafana.teleport.parity.io/goto/6ux5qC_IR?orgId=1)

### Conclusion

The prioritization of subsystem messages reduces the ToF of the
networking subsystem, which helps faster propagation of gossip messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
Status: Completed
Development

Successfully merging a pull request may close this issue.

4 participants