Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: consistent handling of edge cases for header sync #5837

Merged
merged 11 commits into from
Oct 24, 2023

Conversation

hansieodendaal
Copy link
Contributor

@hansieodendaal hansieodendaal commented Oct 11, 2023

Description

This PR fixes an edge case for header sync where the local node has more headers, but the remote node has better proof-of-work (accumulated difficulty), thus higher actual blockchain height. It also simplifies the attempted header sync logic and error handling when a peer does not return headers and introduces consistent naming, variable re-use and use of information

Supersedes #5756

Motivation and Context

When doing header sync, and determining from which peer to sync, we need to consider our actual blockchain state when comparing chains. It is possible that our node has valid block headers that will translate into a higher proof of work when fully synced than the remote syncing peer, but lacking the blocks to back the block headers, thus the remote chain has an actual higher proof of work blockchain.
With the current implementation, race conditions can exist when determining if the peer is misbehaving, i.e. lying about their proof-of-work or not wanting to supply block headers, due to a mismatch in tip height used for the check.

This PR fixes the race conditions by always using the latest local chain metadata and block headers, ignoring all sync peers that do not exceed our own accumulated difficulty and using consistent information in all the checks.

How Has This Been Tested?

Added integration-level unit tests:

  • test_header_sync_happy_path
  • test_header_sync_with_fork_happy_path
  • test_header_sync_uneven_headers_and_blocks_happy_path
  • test_header_sync_uneven_headers_and_blocks_peer_lies_about_pow_no_ban
  • test_header_sync_even_headers_and_blocks_peer_lies_about_pow_with_ban
  • test_header_sync_even_headers_and_blocks_peer_metadata_improve_with_reorg

What process can a PR reviewer use to test or verify this change?

  • Code walk-through
  • Review unit tests

Breaking Changes

  • None
  • Requires data directory on base node to be deleted
  • Requires hard fork
  • Other - Please specify

@github-actions
Copy link

github-actions bot commented Oct 11, 2023

Test Results (CI)

1 239 tests   1 239 ✔️  9m 28s ⏱️
     39 suites         0 💤
       1 files           0

Results for commit 323e3f2.

♻️ This comment has been updated with latest results.

@ghpbot-tari-project ghpbot-tari-project added P-acks_required Process - Requires more ACKs or utACKs P-reviews_required Process - Requires a review from a lead maintainer to be merged labels Oct 11, 2023
@github-actions
Copy link

github-actions bot commented Oct 11, 2023

Test Results (Integration tests)

  2 files  +  2  11 suites  +11   22m 0s ⏱️ + 22m 0s
34 tests +34  33 ✔️ +33  0 💤 ±0  1 +1 
35 runs  +35  34 ✔️ +34  0 💤 ±0  1 +1 

For more details on these failures, see this check.

Results for commit 323e3f2. ± Comparison against base commit 43b994e.

♻️ This comment has been updated with latest results.

@SWvheerden SWvheerden mentioned this pull request Oct 12, 2023
4 tasks
@hansieodendaal hansieodendaal force-pushed the ho_header_sync branch 7 times, most recently from 495f44a to 97879b3 Compare October 17, 2023 18:09
@hansieodendaal hansieodendaal changed the title [wip] feat: consistent handling of edge cases for header sync feat: consistent handling of edge cases for header sync Oct 17, 2023
@hansieodendaal hansieodendaal force-pushed the ho_header_sync branch 2 times, most recently from b5adc65 to febf997 Compare October 17, 2023 19:22
sdbondi
sdbondi previously approved these changes Oct 18, 2023
Copy link
Member

@sdbondi sdbondi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - utACK

@ghpbot-tari-project ghpbot-tari-project removed the P-reviews_required Process - Requires a review from a lead maintainer to be merged label Oct 18, 2023
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just needs a few commented out lines removed

@ghpbot-tari-project ghpbot-tari-project added P-reviews_required Process - Requires a review from a lead maintainer to be merged labels Oct 18, 2023
{})",
header_tip_height,
metadata.height_of_longest_chain(),
{}), peer has lied about chain metadata or did not want to provide headers",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. We only checked if our best_block < our best_headers

Copy link
Collaborator

@SWvheerden SWvheerden Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its true by deduction.
For us to have entered Header_sync the peer needed tot have a better pow(it claimed it did)
Then we check that out pow did not update and that they still have a claimed higher pow.

We then asked the peer to provide us the chain which had a better pow, but it could not. So we have the same or more headers on the exact same chain as they have. But we can have the headers, not the block, so if we check if we have the same blocks than headers, we must have the same or more blocks than they. Its not possible to have blocks and not the headers.

So the only conclusion we can make is one of the following 3:
They lied about their pow
They did not want to provide us with a higher chain
The swapped to a lower-difficulty pow chain.

So we are only left with 1 and 2 realisticly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I provided better data for a better decision

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a link to the new code or more detail about what you did, I can't find it among the other changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditions are now tested separately and both will be ban-able offences.
See

        // Basic sanity check about the peer's claimed chain metadata
        if chain_split_result.peer_accumulated_difficulty < sync_peer.claimed_chain_metadata().accumulated_difficulty()

and

        match header_sync_status.clone() {
            HeaderSyncStatus::InSyncOrAhead => {
                // Our POW is less than the peer's POW, as verified before the attempted header sync, therefore, if the
                // peer did not supply any headers and we know we are behind based on the peer's claimed metadata, then
                // we can ban the peer.
                if best_header.height() == best_block.height() {

@@ -105,6 +108,7 @@ pub struct BaseNodeBuilder {
mempool_config: Option<MempoolConfig>,
mempool_service_config: Option<MempoolServiceConfig>,
liveness_service_config: Option<LivenessConfig>,
p2p_config: Option<P2pConfig>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but this struct should really be called "TestBaseNodeBuilder"

@@ -135,7 +135,7 @@ pub async fn initialize_local_test_comms<P: AsRef<Path>>(
discovery_request_timeout: Duration,
seed_peers: Vec<Peer>,
shutdown_signal: ShutdownSignal,
) -> Result<(CommsNode, Dht, MessagingEventSender), CommsInitializationError> {
) -> Result<(UnspawnedCommsNode, Dht, MessagingEventSender), CommsInitializationError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: This test method should also be moved into a file that shows it's a test method. Because the method is so long, it is difficult to see that it's test code

@hansieodendaal hansieodendaal force-pushed the ho_header_sync branch 2 times, most recently from 6326491 to 8e46cc5 Compare October 19, 2023 02:42
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding in a new field to the proto for the peer to also send is meaningless.
They already sent us one that they claim is higher than ours. All we have to do now is verify it, we should not ask for a new one.

base_layer/core/src/base_node/proto/rpc.proto Outdated Show resolved Hide resolved
@hansieodendaal hansieodendaal force-pushed the ho_header_sync branch 2 times, most recently from b400759 to 2535e04 Compare October 23, 2023 08:15
dependabot bot and others added 7 commits October 23, 2023 11:04
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.37.23 to 0.37.26.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](bytecodealliance/rustix@v0.37.23...v0.37.26)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
This PR fixes an edge case for header sync where the local node has a higher chain
header, but the remote node has a higher actual blockchain height. It also simplifies
the attempted header sync logic and error handling when a peer does not return headers
and introduces consistent naming, variable re-use and use of information.

When doing header sync, and determining from which peer to sync, we need to consider
our actual blockchain state when comparing chains. It is possible that our node has
valid block headers that will translate into a higher proof of work when fully synced
than the remote syncing peer, but lacking the blocks to back the block headers, thus
the remote chain has an actual higher proof of work blockchain.

With the current implementation, race conditions can exist when determining if the
peer is misbehaving, i.e. lying about their proof-of-work or not wanting to supply
block headers, due to a mismatch in tip height used for the check.

This PR fixes the race conditions by always using the latest local chain metadata
and block headers, ignoring all sync peers that do not exceed our own accumulated
difficulty and using consistent information in all the checks.

Added a header sync tests integration unit test.
@SWvheerden
Copy link
Collaborator

Fixes: #5816

Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

base_layer/core/src/base_node/sync/header_sync/error.rs Outdated Show resolved Hide resolved
base_layer/core/tests/helpers/sync.rs Outdated Show resolved Hide resolved
base_layer/core/tests/tests/header_sync.rs Outdated Show resolved Hide resolved
base_layer/core/tests/tests/header_sync.rs Outdated Show resolved Hide resolved
@SWvheerden SWvheerden merged commit 3e1ec1f into tari-project:development Oct 24, 2023
12 of 13 checks passed
@hansieodendaal hansieodendaal deleted the ho_header_sync branch October 25, 2023 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-acks_required Process - Requires more ACKs or utACKs P-reviews_required Process - Requires a review from a lead maintainer to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants