Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use maximum allowed response size for request/response protocols #5753

Merged
merged 5 commits into from
Sep 19, 2024

Conversation

AndreiEres
Copy link
Contributor

@AndreiEres AndreiEres commented Sep 18, 2024

Description

Adjust the PoV response size to the default values used in the substrate.
Fixes #5503

Integration

The changes shouldn't impact downstream projects since we are only increasing the limit.

Review Notes

You can't see it from the changes, but it affects all protocols that use the POV_RESPONSE_SIZE constant.

  • Protocol::ChunkFetchingV1
  • Protocol::ChunkFetchingV2
  • Protocol::CollationFetchingV1
  • Protocol::CollationFetchingV2
  • Protocol::PoVFetchingV1
  • Protocol::AvailableDataFetchingV1

Increasing timeouts

/// This timeout is based on what seems sensible from a time budget perspective, considering 6
/// second block time. This is going to be tough, if we have multiple forks and large PoVs, but we
/// only have so much time.
const POV_REQUEST_TIMEOUT_CONNECTED: Duration = Duration::from_millis(1200);

I assume the current PoV request timeout is set to 1.2s to handle 5 consecutive requests during a 6s block. This setting does not relate to the PoV response size. I see no reason to change the current timeouts after adjusting the response size.

However, we should consider networking speed limitations if we want to increase the maximum PoV size to 10 MB. With the number of parallel requests set to 10, validators will need the following networking speeds:

  • 5 MB PoV: at least 42 MB/s, ideally 50 MB/s.
  • 10 MB PoV: at least 84 MB/s, ideally 100 MB/s.

The current required speed of 50 MB/s aligns with the 62.5 MB/s specified in the reference hardware requirements. Increasing the PoV size to 10 MB may require a higher networking speed.

@AndreiEres AndreiEres marked this pull request as ready for review September 18, 2024 13:58
@AndreiEres AndreiEres changed the title Update POV_RESPONSE_SIZE Use maximum allowed response size for request/response protocols Sep 18, 2024
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRDoc ?

/// limit might have more severe effects.
const POV_RESPONSE_SIZE: u64 = MAX_POV_SIZE as u64 + 10_000;
/// Same as what we use in substrate networking.
const POV_RESPONSE_SIZE: u64 = 16 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think substrate should export a constant and we should use it here.

@sandreim
Copy link
Contributor

The current required speed of 50 MB/s aligns with the 62.5 MB/s specified in the reference hardware requirements. Increasing the PoV size to 10 MB may require a higher networking speed.

Yes, this is worst case scenario, when all blocks you need to recover are full. We'll likely have to raise specs for networking,

@AndreiEres AndreiEres added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Sep 19, 2024
/// limit might have more severe effects.
const POV_RESPONSE_SIZE: u64 = MAX_POV_SIZE as u64 + 10_000;
/// Same as what we use in substrate networking.
const POV_RESPONSE_SIZE: u64 = MAX_RESPONSE_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to not base this value on MAX_POV_SIZE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay, I suppose, because we make these changes to bump it.

@sandreim
Copy link
Contributor

The current required speed of 50 MB/s aligns with the 62.5 MB/s specified in the reference hardware requirements. Increasing the PoV size to 10 MB may require a higher networking speed.

Yes, this is worst case scenario, when all blocks you need to recover are full. We'll likely have to raise specs for networking,

Did a bit of math in #5334 (comment) . looks like we should be fine.

prdoc/pr_5753.prdoc Outdated Show resolved Hide resolved
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
@AndreiEres AndreiEres added this pull request to the merge queue Sep 19, 2024
Merged via the queue into master with commit 0c9d8fe Sep 19, 2024
182 of 207 checks passed
@AndreiEres AndreiEres deleted the AndreiEres/issue5503 branch September 19, 2024 16:38
@AndreiEres AndreiEres added the A4-needs-backport Pull request must be backported to all maintained releases. label Sep 30, 2024
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2407:

github-actions bot pushed a commit that referenced this pull request Sep 30, 2024
# Description

Adjust the PoV response size to the default values used in the
substrate.
Fixes #5503

## Integration

The changes shouldn't impact downstream projects since we are only
increasing the limit.

## Review Notes

You can't see it from the changes, but it affects all protocols that use
the `POV_RESPONSE_SIZE` constant.
- Protocol::ChunkFetchingV1
- Protocol::ChunkFetchingV2
- Protocol::CollationFetchingV1
- Protocol::CollationFetchingV2
- Protocol::PoVFetchingV1
- Protocol::AvailableDataFetchingV1

## Increasing timeouts

https://github.com/paritytech/polkadot-sdk/blob/fae15379cba0c876aa16c77e11809c83d1db8f5c/polkadot/node/network/protocol/src/request_response/mod.rs#L126-L129

I assume the current PoV request timeout is set to 1.2s to handle 5
consecutive requests during a 6s block. This setting does not relate to
the PoV response size. I see no reason to change the current timeouts
after adjusting the response size.

However, we should consider networking speed limitations if we want to
increase the maximum PoV size to 10 MB. With the number of parallel
requests set to 10, validators will need the following networking
speeds:
- 5 MB PoV: at least 42 MB/s, ideally 50 MB/s.
- 10 MB PoV: at least 84 MB/s, ideally 100 MB/s.

The current required speed of 50 MB/s aligns with the 62.5 MB/s
specified [in the reference hardware
requirements](https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware).
Increasing the PoV size to 10 MB may require a higher networking speed.

---------

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
(cherry picked from commit 0c9d8fe)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2409:

github-actions bot pushed a commit that referenced this pull request Sep 30, 2024
# Description

Adjust the PoV response size to the default values used in the
substrate.
Fixes #5503

## Integration

The changes shouldn't impact downstream projects since we are only
increasing the limit.

## Review Notes

You can't see it from the changes, but it affects all protocols that use
the `POV_RESPONSE_SIZE` constant.
- Protocol::ChunkFetchingV1
- Protocol::ChunkFetchingV2
- Protocol::CollationFetchingV1
- Protocol::CollationFetchingV2
- Protocol::PoVFetchingV1
- Protocol::AvailableDataFetchingV1

## Increasing timeouts

https://github.com/paritytech/polkadot-sdk/blob/fae15379cba0c876aa16c77e11809c83d1db8f5c/polkadot/node/network/protocol/src/request_response/mod.rs#L126-L129

I assume the current PoV request timeout is set to 1.2s to handle 5
consecutive requests during a 6s block. This setting does not relate to
the PoV response size. I see no reason to change the current timeouts
after adjusting the response size.

However, we should consider networking speed limitations if we want to
increase the maximum PoV size to 10 MB. With the number of parallel
requests set to 10, validators will need the following networking
speeds:
- 5 MB PoV: at least 42 MB/s, ideally 50 MB/s.
- 10 MB PoV: at least 84 MB/s, ideally 100 MB/s.

The current required speed of 50 MB/s aligns with the 62.5 MB/s
specified [in the reference hardware
requirements](https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware).
Increasing the PoV size to 10 MB may require a higher networking speed.

---------

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
(cherry picked from commit 0c9d8fe)
@eskimor
Copy link
Member

eskimor commented Oct 1, 2024

I assume the current PoV request timeout is set to 1.2s to handle 5 consecutive requests during a 6s block. This setting does not relate to the PoV response size. I see no reason to change the current timeouts after adjusting the response size.

This is not correct. It is set to that value because of synchronous backing. There we have a very small overall time budget. This is no longer true for asynchronous backing.

Considering that this is a hard timeout where we completely drop the response if exceeded, it might make sense to think about bumping it a bit. This would add robustness, especially with concurrent requests. @AndreiEres Given that a single fetch should take around 200ms it probably is still ok. I don't see any harm in bumping though. In any case let's test this with Gluttons on Kusama, Lot's of 10MB PoVs ... let's see what happens.

With the number of parallel requests set to 10

Those numbers should add up. If we have 10 parallel requests, than the timeout should be around 2s.

EgorPopelyaev pushed a commit that referenced this pull request Oct 1, 2024
Backport #5753 into `stable2409` from AndreiEres.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Andrei Eres <eresav@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use maximum allowed response size for request/response protocols
4 participants