Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p performance issues, regression? #6012

Open
rvalle opened this issue Oct 10, 2024 · 5 comments
Open

p2p performance issues, regression? #6012

rvalle opened this issue Oct 10, 2024 · 5 comments
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@rvalle
Copy link

rvalle commented Oct 10, 2024

Hi!

We run full archive nodes for our analytics project. Typically we run on constrained resources. Our nodes typically run like clockwork with 2 peer in/out configuration.

Recently we updated our Polkadot node from docker version: v1.15.1 to v1.16.0 and p2p is starting to get stuck, even after doubling the peer count. This is something that we reported in the past, and perhaps for some reason, there was some kind of regression.

Here is what we are seeing now:

24h Screenshot 2024-10-10 at 15-21-57 View panel - Polkadot Node Monitoring - Starred - Grafana

Prior to this upgrade, p2p would not stuck at all, our very very rarely and for very little block could:

v1 15 1 Screenshot 2024-10-10 at 15-25-22 View panel - Polkadot Node Monitoring - Starred - Grafana

We reported a similar issue in the past, here is some reference: paritytech/polkadot#6696 (comment)

Back them it was fixed.

@github-actions github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Oct 10, 2024
@lexnv
Copy link
Contributor

lexnv commented Oct 10, 2024

The issues was fixed in the past with:

That fix has been reverted by:

cc perf issue: #5221

@bkchr
Copy link
Member

bkchr commented Oct 10, 2024

@lexnv can we close this if there exist another issue that tracks this?

@lexnv
Copy link
Contributor

lexnv commented Oct 10, 2024

We still need to double-check this issue, it might expose some regressions we introduced between 0.15 and 0.16.

The perf issue should already affect 0.15, cc'ed to make sure we don't forget to check protocol performance (which we might have missed with libp2p update)

@rvalle
Copy link
Author

rvalle commented Oct 15, 2024

@lexnv yes, 1.15 is also affected, attempted a rollback and issue is still there. Also notice bandwidth usage is in the order of 3x, when using low peer numbers. this was also reported before, and most possibly related.

Pay attention to the gap between bandwidth reported and used. A 2 peer node can work with less than 1-3Mb/s (in line with reported), its now averaging 8Mb/s with peaks of 13Mb/s, despite reporting much less.

@rvalle
Copy link
Author

rvalle commented Oct 15, 2024

@lexnv I have not tested a lot, but today I changed to the little p2p backend, and it seems to be free from the issue:

Screenshot 2024-10-15 at 17-13-43 View panel - Polkadot Node Monitoring - Starred - Grafana

I would say that the bandwidth usage has increased compared to the pre-regression version but not as much as with the default implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

No branches or pull requests

3 participants