Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replay: do not start leader for a block we already have shreds for #2416

Merged

Conversation

AshwinSekar
Copy link

Problem

See solana-labs#32679

In certain scenarios where the first leader block is not produced, however the second (or later) leader block is produced we can end up reproducing this block after resetting to a previous block.

Summary of Changes

When poh_recorder checks for leader slot, additionally check blockstore to see if shreds have already been inserted.

core/src/replay_stage.rs Outdated Show resolved Hide resolved
poh/src/poh_recorder.rs Outdated Show resolved Hide resolved
@AshwinSekar
Copy link
Author

I'll address leader_schedule_cache::next_leader_slot in a follow up PR once we decide on a direction. It sounds like we want this blockstore check here regardless.

carllin
carllin previously approved these changes Aug 5, 2024
@AshwinSekar AshwinSekar merged commit 15dbe7f into anza-xyz:master Aug 6, 2024
40 checks passed
@AshwinSekar AshwinSekar deleted the start-leader-check-blockstore branch August 6, 2024 21:37
@AshwinSekar AshwinSekar added the v2.0 Backport to v2.0 branch label Aug 8, 2024
Copy link

mergify bot commented Aug 8, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify bot pushed a commit that referenced this pull request Aug 8, 2024
…2416)

* replay: do not start leader for a block we already have shreds for

* pr feedback: comment, move existing check to blockstore fn

* move blockstore read after tick height check

* pr feedback: resuse blockstore fn in next_leader_slot

(cherry picked from commit 15dbe7f)

# Conflicts:
#	poh/src/poh_recorder.rs
@jstarry jstarry self-requested a review August 8, 2024 06:05
Comment on lines +582 to +584
// as it was not part of the rooted fork. If this slot is not the first slot for this leader,
// and the first slot was previously ticked over, the check in `leader_schedule_cache::next_leader_slot`
// will not suffice, as it only checks if there are shreds for the first slot.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough fn next_leader_slot returns (start_slot, last_slot) and last_slot is used to calculate leader_last_tick_height in PohRecorder but we only use leader_last_tick_height inside would_be_leader. Maybe reached_leader_tick needs to use it as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be an option, it would allow us to remove the check here for when we tick into the next leader:

agave/core/src/replay_stage.rs

Lines 2073 to 2075 in c99095d

// I guess I missed my slot
if next_leader != *my_pubkey {
return false;

However I think if we use it in reached_leader_tick, we should recompute our next leader window. Otherwise if we don't reset in time, we could end up missing our next leader window, which the current code prevents.

AshwinSekar added a commit that referenced this pull request Aug 15, 2024
…for (backport of #2416) (#2484)

* replay: do not start leader for a block we already have shreds for (#2416)

* replay: do not start leader for a block we already have shreds for

* pr feedback: comment, move existing check to blockstore fn

* move blockstore read after tick height check

* pr feedback: resuse blockstore fn in next_leader_slot

(cherry picked from commit 15dbe7f)

# Conflicts:
#	poh/src/poh_recorder.rs

* fix conflicts

---------

Co-authored-by: Ashwin Sekar <ashwin@anza.xyz>
Co-authored-by: Ashwin Sekar <ashwin@solana.com>
ray-kast pushed a commit to abklabs/agave that referenced this pull request Nov 27, 2024
…nza-xyz#2416)

* replay: do not start leader for a block we already have shreds for

* pr feedback: comment, move existing check to blockstore fn

* move blockstore read after tick height check

* pr feedback: resuse blockstore fn in next_leader_slot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.0 Backport to v2.0 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants