Add unit test for ReplayStage::maybe_start_leader() #32679

steviez · 2023-08-01T23:30:20Z

Problem

This PR introduces a unit test that demonstrates a series of events that could lead to the panic described in #32466.

Summary of Changes

Right now, this PR only has a unit test to demonstrate the problem. Currently, there is a check that will bail if we see the slot in BankForks already:

solana/core/src/replay_stage.rs

Lines 1873 to 1876 in 69336ab

    
           if bank_forks.read().unwrap().get(poh_slot).is_some() { 
        
               warn!("{} already have bank in forks at {}?", my_pubkey, poh_slot); 
        
               return; 
        
           }

Proposed Solutions

Similar to the BankForks check above, explicitly check the blockstore to see if there is a shred for the new leader slot
A critical part of the sequence is ReplayStage::reset_poh_recorder() resetting to an older slot. We could add logic to detect if we reset to an older slot, ie a boolean like reset_to_older. We could then pass reset_to_older into ReplayStage::maybe_start_leader(), and only check blockstore if reset_to_older was true.

When a new bank is created, an entry is inserted for each slot into this common structure in AccountsDb:

solana/runtime/src/accounts_db.rs

Lines 4954 to 4964 in 69336ab

    
               /// Insert a default bank hash stats for `slot` 
        
               /// 
        
               /// This fn is called when creating a new bank from parent. 
        
               pub(crate) fn insert_default_bank_hash_stats(&self, slot: Slot, parent_slot: Slot) { 
        
                   let mut bank_hash_stats = self.bank_hash_stats.lock().unwrap(); 
        
                   if bank_hash_stats.get(&slot).is_some() { 
        
                       error!("set_hash: already exists; multiple forks with shared slot {slot} as child (parent: {parent_slot})!?"); 
        
                       return; 
        
                   } 
        
                   bank_hash_stats.insert(slot, BankHashStats::default()); 
        
               }

In the debug described in linked issue, the error statement in the if case was observed. This is because the Bank was purged from BankForks, but slot was still newer than the latest root so it hadn't been fully cleaned yet. Thus, this entry in the data structure somewhat serves as a tombstone that we had a Bank for this slot, and could be used to detect whether we should bail on this slot. That is, call the below function and skip the leader slot if the result is Some(...):

solana/runtime/src/accounts_db.rs

Lines 7895 to 7898 in 69336ab

    
               /// Get the bank hash stats for `slot` in the `bank_hash_stats` map 
        
               pub fn get_bank_hash_stats(&self, slot: Slot) -> Option<BankHashStats> { 
        
                   self.bank_hash_stats.lock().unwrap().get(&slot).cloned() 
        
               }

Some notes about the options:

1 would hit the blockstore fairly often which is obviously not ideal
2 is an optimization of 1 that hits the blockstore less frequently
3 would seemingly work with minimal perf impact; however, but I'm not sure if this would be "good use" of AccountsDb or if we're abusing something / liable to a race condition / etc. I will follow up with someone more familiar with AccountsDb to figure this out.

Fixes #32466

steviez · 2023-08-14T17:16:07Z

Hi @brooksprumo - I wanted to get your opinion on a possible solution for a bug this PR demonstrates. Namely, I'm curious about your thoughts on propsed solution 3. above. I had the following concerns that I hope you might be able to speak to:

3 would seemingly work with minimal perf impact; however, but I'm not sure if this would be "good use" of AccountsDb or if we're abusing something / liable to a race condition / etc. I will follow up with someone more familiar with AccountsDb to figure this out.

steviez · 2023-08-14T17:17:44Z

@carllin - If the AccountsDb folks say we shouldn't do 3, thoughts on 1 v 2 in proposed solutions. 1 is simpler but 2 would likely be more performant; given that this is a critical path, thinking we'd want to do with 2, but curious to hear your thoughts / maybe any other solutions that I haven't thought of

carllin · 2023-08-15T23:01:45Z

Adding some context for the order of events that causes this in PohRecorder. @steviez might be good to add this to the problem description:

reset_poh_recorder in ReplayStage is called, which finds the next leader slot

solana/core/src/replay_stage.rs

Lines 2496 to 2502 in 1c01cd2

    
           let next_leader_slot = leader_schedule_cache.next_leader_slot( 
        
               my_pubkey, 
        
               slot, 
        
               &bank, 
        
               Some(blockstore), 
        
               GRACE_TICKS_FACTOR * MAX_GRACE_SLOTS, 
        
           );

and skips leader blocks for which a shred already exists in blockstore.

However, this check for existing shreds in next_leader_slot

solana/ledger/src/leader_schedule_cache.rs

Lines 139 to 147 in 1c01cd2

    
           .skip_while(|slot| { 
        
               match blockstore { 
        
                   None => false, 
        
                   // Skip slots we have already sent a shred for. 
        
                   Some(blockstore) => match blockstore.meta(*slot).unwrap() { 
        
                       Some(meta) => meta.received > 0, 
        
                       None => false, 
        
                   }, 
        
               }

only extends to the first slot in the 4 leader block. Thus if say a leader is scheduled for slots 1-4, and slot 1 doesn't have a shred, but slot 2 does, this check does not catch that, and still returns slot 1 as a valid next leader slot.

Then in the reached_leader_slot() check:

solana/core/src/replay_stage.rs

Line 1847 in 1c01cd2

let (poh_slot, parent_slot) = match poh_recorder.read().unwrap().reached_leader_slot() {

, if the leader's Poh ticked past slot 1 for which it was scheduled (for instance if replay was being slow), and hits slot 2 for which it had already made a shred, it will make another block for slot 2.

carllin · 2023-08-15T23:12:09Z

I think 1 is the right fix, we can cache the result of the check in PohRecorder so we can fail earlier in the PohRecorder::reached_leader_slot() . The cache could live in PohRecorder so reached_leader_slot can check it, and get cleared everytime we call PohRecorder::reset()

brooksprumo · 2023-08-16T13:08:36Z

When a new bank is created, an entry is inserted for each slot into this common structure in AccountsDb:

In the debug described in linked issue, the error statement in the if case was observed. This is because the Bank was purged from BankForks, but slot was still newer than the latest root so it hadn't been fully cleaned yet. Thus, this entry in the data structure somewhat serves as a tombstone that we had a Bank for this slot, and could be used to detect whether we should bail on this slot. That is, call the below function and skip the leader slot if the result is Some(...):

3 would seemingly work with minimal perf impact; however, but I'm not sure if this would be "good use" of AccountsDb or if we're abusing something / liable to a race condition / etc. I will follow up with someone more familiar with AccountsDb to figure this out.

We can (i.e. should) purge all slot information from AccountsDb for any bank that is purged from BankForks—root or not. So if we've purged a bank from BankForks but there is still information (bank_hash_stats) leftover in AccountsDb I would think that would be a bug. Is that what's happening?

AshwinSekar · 2023-09-07T00:08:47Z

It's interesting that a shred from our upcoming leader slot would be present in blockstore but missing from bank forks.

This would not be possible through the duplicate block pathway because we exit if we ever dump our leader bank from bank forks:

solana/core/src/replay_stage.rs

Lines 1364 to 1371 in f4816dc

    
                               // Should not dump slots for which we were the leader 
        
                               if Some(*my_pubkey) == leader_schedule_cache.slot_leader_at(*duplicate_slot, None) { 
        
                                       panic!("We are attempting to dump a block that we produced. \ 
        
                                           This indicates that we are producing duplicate blocks, \ 
        
                                           or that there is a bug in our runtime/replay code which \ 
        
                                           causes us to compute different bank hashes than the rest of the cluster. \ 
        
                                           We froze slot {duplicate_slot} with hash {frozen_hash:?} while the cluster hash is {correct_hash}"); 
        
                               }

Similarly turbine/repair should run through sigverify before inserting any shreds into blockstore, which drops shreds for our leader:

solana/turbine/src/sigverify_shreds.rs

Lines 184 to 187 in f4816dc

    
           // Discard the shred if the slot leader is the node itself. 
        
           leader_schedule_cache 
        
               .slot_leader_at(slot, Some(bank)) 
        
               .filter(|leader| leader != self_pubkey)

It's unlikely that it was cleaned up in bank forks due to a new root being set, as we are resetting onto a fork before that slot in consensus after.

AshwinSekar · 2023-09-07T00:17:01Z

perhaps related, we saw the opposite problem: 1 (non leader) block in blockstore generating 2 banks in bank forks here
#33149 (comment)

triggered the same error message in accounts db

error!("set_hash: already exists; multiple forks with shared slot {slot} as child (parent: {parent_slot})!?");

brooksprumo · 2023-09-07T13:28:49Z

error!("set_hash: already exists; multiple forks with shared slot {slot} as child (parent: {parent_slot})!?");

I think my/this comment is related-ish? I tried to turn this error! into an assert! a few months ago, but either tests or running against mnb hit it. So... Is it an invariant, or is it something that's recoverable? Currently we treat it as recoverable. It would be interesting if y'alls work identifies something else.

steviez · 2023-09-07T17:49:47Z

It's unlikely that it was cleaned up in bank forks due to a new root being set, as we are resetting onto a fork before that slot in consensus after.

This is actually what happened where the new root is NOT an ancestor of the duplicate.

AshwinSekar · 2023-09-07T20:12:02Z

the new root is NOT an ancestor of the duplicate.

I thought you were running master in the pop cluster. No one should be submitting duplicate blocks right? Are there logs somewhere I can look into?

carllin · 2023-09-09T01:31:48Z

@AshwinSekar it's a bug in PohRecorder. The block is made on a minor fork, gets pruned when a root is made, then gets recreated later b/c of the PohRecorder bug. See post above: #32679 (comment)

Add unit test for ReplayStage::maybe_start_leader()

1c01cd2

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Aug 31, 2023

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 2, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 25, 2023

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 26, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Oct 11, 2023

github-actions bot closed this Oct 19, 2023

steviez reopened this Oct 19, 2023

github-actions bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Oct 20, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 3, 2023

github-actions bot closed this Nov 13, 2023

steviez reopened this Nov 13, 2023

github-actions bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 14, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 28, 2023

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 29, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 13, 2023

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 15, 2023

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 1, 2024

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 5, 2024

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 19, 2024

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 28, 2024

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Feb 12, 2024

github-actions bot closed this Feb 20, 2024

AshwinSekar mentioned this pull request Aug 2, 2024

replay: do not start leader for a block we already have shreds for anza-xyz/agave#2416

Merged

mergify bot mentioned this pull request Aug 8, 2024

v2.0: replay: do not start leader for a block we already have shreds for (backport of #2416) anza-xyz/agave#2484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit test for ReplayStage::maybe_start_leader() #32679

Add unit test for ReplayStage::maybe_start_leader() #32679

steviez commented Aug 1, 2023 •

edited

Loading

steviez commented Aug 14, 2023

steviez commented Aug 14, 2023

carllin commented Aug 15, 2023

carllin commented Aug 15, 2023 •

edited

Loading

brooksprumo commented Aug 16, 2023

AshwinSekar commented Sep 7, 2023 •

edited

Loading

AshwinSekar commented Sep 7, 2023

brooksprumo commented Sep 7, 2023 •

edited

Loading

steviez commented Sep 7, 2023

AshwinSekar commented Sep 7, 2023

carllin commented Sep 9, 2023 •

edited

Loading

	if bank_forks.read().unwrap().get(poh_slot).is_some() {
	warn!("{} already have bank in forks at {}?", my_pubkey, poh_slot);
	return;
	}

	/// Insert a default bank hash stats for `slot`
	///
	/// This fn is called when creating a new bank from parent.
	pub(crate) fn insert_default_bank_hash_stats(&self, slot: Slot, parent_slot: Slot) {
	let mut bank_hash_stats = self.bank_hash_stats.lock().unwrap();
	if bank_hash_stats.get(&slot).is_some() {
	error!("set_hash: already exists; multiple forks with shared slot {slot} as child (parent: {parent_slot})!?");
	return;
	}
	bank_hash_stats.insert(slot, BankHashStats::default());
	}

	/// Get the bank hash stats for `slot` in the `bank_hash_stats` map
	pub fn get_bank_hash_stats(&self, slot: Slot) -> Option<BankHashStats> {
	self.bank_hash_stats.lock().unwrap().get(&slot).cloned()
	}

Add unit test for ReplayStage::maybe_start_leader() #32679

Add unit test for ReplayStage::maybe_start_leader() #32679

Conversation

steviez commented Aug 1, 2023 • edited Loading

Problem

Summary of Changes

Proposed Solutions

steviez commented Aug 14, 2023

steviez commented Aug 14, 2023

carllin commented Aug 15, 2023

carllin commented Aug 15, 2023 • edited Loading

brooksprumo commented Aug 16, 2023

AshwinSekar commented Sep 7, 2023 • edited Loading

AshwinSekar commented Sep 7, 2023

brooksprumo commented Sep 7, 2023 • edited Loading

steviez commented Sep 7, 2023

AshwinSekar commented Sep 7, 2023

carllin commented Sep 9, 2023 • edited Loading

steviez commented Aug 1, 2023 •

edited

Loading

carllin commented Aug 15, 2023 •

edited

Loading

AshwinSekar commented Sep 7, 2023 •

edited

Loading

brooksprumo commented Sep 7, 2023 •

edited

Loading

carllin commented Sep 9, 2023 •

edited

Loading