Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: orphan chunk state witness pool #10613

Merged
merged 37 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
8dd0c4e
Remove witness retrying logic on missing block
jancionear Feb 13, 2024
456a821
Add verify_chunk_state_witness_signature_in_epoch
jancionear Feb 14, 2024
16828b3
Add OrphanStateWitnessPool
jancionear Feb 14, 2024
d3dc400
Split process_chunk_state_witness into two parts
jancionear Feb 14, 2024
5da1fea
Add possible_epochs_of_height_around_tip
jancionear Feb 21, 2024
9cb7df9
Add orphan state witnesses to the orphan pool
jancionear Feb 14, 2024
5a76c7c
Process orphaned witnesses when a new block is accepted
jancionear Feb 14, 2024
97a0482
Allow to configure orphan pool size from config.js
jancionear Feb 16, 2024
f9b9c52
Add metrics to improve observability of OrphanStateWitnessPool
jancionear Feb 16, 2024
42af070
Add unit tests for OrphanStateWitnessPool
jancionear Feb 16, 2024
6592ace
Add TestEnv::get_client_index()
jancionear Feb 20, 2024
a29b8df
Add TestEnv::wait_for_chunk_endorsement
jancionear Feb 20, 2024
22a2b15
Expose some items that are needed for the integration tests
jancionear Feb 20, 2024
9974f09
Move get_block_producer to TestEnv
jancionear Feb 21, 2024
dc19727
Add TestEnv::get_chunk_producer_at_offset
jancionear Feb 21, 2024
0183075
Add integration tests
jancionear Feb 21, 2024
aedb405
Use ByteSize
jancionear Feb 22, 2024
198a428
Add comment on process_ready_orphan_chunk_state_witnesses
jancionear Feb 22, 2024
49bea32
Use _metrics_tracker instead of #[allow(dead_code)]
jancionear Feb 22, 2024
36b7391
use tracing::span to reduce repetition
jancionear Feb 22, 2024
f4cd1e7
Merge branch 'master' into orphan-witness
jancionear Feb 22, 2024
3b3f8c4
Remove extra newline
jancionear Feb 22, 2024
db14aa8
Fix cargo format, VSCode forgot to format a file o_0
jancionear Feb 22, 2024
3c045e2
Apply review comments
jancionear Feb 22, 2024
3bcd5c5
Use ChunkStateWitness::new_dummy in another test
jancionear Feb 22, 2024
0b126b8
Remove comment which no longer applies
jancionear Feb 22, 2024
a6858eb
Increase default orphan pool size to 25
jancionear Feb 22, 2024
0edf57a
I'm bad at math
jancionear Feb 22, 2024
f61f34c
fix log message when a witness is ejected
jancionear Feb 23, 2024
bb41b37
Use tracing for logs in test_possible_epochs_of_height_around_tip
jancionear Feb 23, 2024
33248ac
Use naive implementation of OrphanStateWitnessPool to reduce complexity
jancionear Feb 23, 2024
4e6711d
Merge branch 'master' into orphan-witness
jancionear Feb 23, 2024
24b25f5
Fix the import of near-o11y
jancionear Feb 23, 2024
58be57b
Fix formatting of epoch-manager/Cargo.toml
jancionear Feb 23, 2024
056393d
Merge branch 'master' into orphan-witness
jancionear Feb 23, 2024
5cae3c3
fix: Don't remove the handler for ChunkEndorsementMessage
jancionear Feb 23, 2024
6e9b143
Revert "fix: Don't remove the handler for ChunkEndorsementMessage"
jancionear Feb 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions chain/chain/src/test_utils/kv_runtime.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use near_epoch_manager::types::BlockHeaderInfo;
use near_epoch_manager::{EpochManagerAdapter, RngSeed};
use near_pool::types::TransactionGroupIterator;
use near_primitives::account::{AccessKey, Account};
use near_primitives::block::Tip;
use near_primitives::block_header::{Approval, ApprovalInner};
use near_primitives::epoch_manager::block_info::BlockInfo;
use near_primitives::epoch_manager::epoch_info::EpochInfo;
Expand Down Expand Up @@ -954,6 +955,14 @@ impl EpochManagerAdapter for MockEpochManager {
Ok(true)
}

fn verify_chunk_state_witness_signature_in_epoch(
&self,
_state_witness: &ChunkStateWitness,
_epoch_id: &EpochId,
) -> Result<bool, Error> {
Ok(true)
}

fn cares_about_shard_from_prev_block(
&self,
parent_hash: &CryptoHash,
Expand Down Expand Up @@ -1002,6 +1011,14 @@ impl EpochManagerAdapter for MockEpochManager {
Ok(shard_layout != next_shard_layout)
}

fn possible_epochs_of_height_around_tip(
&self,
_tip: &Tip,
_height: BlockHeight,
) -> Result<Vec<EpochId>, EpochError> {
unimplemented!();
}

#[cfg(feature = "new_epoch_sync")]
fn get_all_epoch_hashes(
&self,
Expand Down
1 change: 1 addition & 0 deletions chain/client/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ thiserror.workspace = true
tokio.workspace = true
tracing.workspace = true
yansi.workspace = true
bytesize.workspace = true

near-async.workspace = true
near-cache.workspace = true
Expand Down
3 changes: 3 additions & 0 deletions chain/client/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ impl Client {
network_adapter.clone().into_sender(),
runtime_adapter.clone(),
chunk_endorsement_tracker.clone(),
config.orphan_state_witness_pool_size,
);
let chunk_distribution_network = ChunkDistributionNetwork::from_config(&config);
Ok(Self {
Expand Down Expand Up @@ -1640,6 +1641,8 @@ impl Client {

self.shards_manager_adapter
.send(ShardsManagerRequestFromClient::CheckIncompleteChunks(*block.hash()));

self.process_ready_orphan_chunk_state_witnesses(&block);
wacban marked this conversation as resolved.
Show resolved Hide resolved
}

/// Reconcile the transaction pool after processing a block.
Expand Down
36 changes: 7 additions & 29 deletions chain/client/src/client_actions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1824,35 +1824,13 @@ impl ClientActionHandler<SyncMessage> for ClientActions {
}
}

impl ClientActions {
pub fn handle_state_witness_message(
&mut self,
msg: ChunkStateWitnessMessage,
ctx: &mut dyn DelayedActionRunner<Self>,
) {
let peer_id = msg.peer_id.clone();
let attempts_remaining = msg.attempts_remaining;
match self.client.process_chunk_state_witness(msg.witness, msg.peer_id, None) {
Err(err) => {
tracing::error!(target: "client", ?err, "Error processing chunk state witness");
}
Ok(Some(witness)) => {
if attempts_remaining > 0 {
ctx.run_later(Duration::from_millis(100), move |actions, ctx| {
actions.handle_state_witness_message(
ChunkStateWitnessMessage {
witness,
peer_id,
attempts_remaining: attempts_remaining - 1,
},
ctx,
);
});
} else {
tracing::error!(target: "client", "Failed to process chunk state witness even after 5 tries due to missing parent block");
}
}
Ok(None) => {}
impl ClientActionHandler<ChunkStateWitnessMessage> for ClientActions {
type Result = ();

#[perf]
fn handle(&mut self, msg: ChunkStateWitnessMessage) -> Self::Result {
if let Err(err) = self.client.process_chunk_state_witness(msg.witness, msg.peer_id, None) {
tracing::error!(target: "client", ?err, "Error processing chunk state witness");
}
}
}
Expand Down
13 changes: 0 additions & 13 deletions chain/client/src/client_actor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ use near_chunks::adapter::ShardsManagerRequestFromClient;
use near_client_primitives::types::Error;
use near_epoch_manager::shard_tracker::ShardTracker;
use near_epoch_manager::{EpochManagerAdapter, RngSeed};
use near_network::client::ChunkStateWitnessMessage;
use near_network::types::PeerManagerAdapter;
use near_o11y::{handler_debug_span, WithSpanContext};
use near_primitives::network::PeerId;
Expand Down Expand Up @@ -151,18 +150,6 @@ where
}
}

// This one requires the context for further scheduling of messages, so
// we can't use the generic wrapper above.
impl Handler<WithSpanContext<ChunkStateWitnessMessage>> for ClientActor {
type Result = ();

fn handle(&mut self, msg: WithSpanContext<ChunkStateWitnessMessage>, ctx: &mut Context<Self>) {
self.wrap(msg, ctx, "ChunkStateWitnessMessage", |this, msg, ctx| {
this.actions.handle_state_witness_message(msg, ctx)
})
}
}

/// Returns random seed sampled from the current thread
pub fn random_seed_from_thread() -> RngSeed {
let mut rng_seed: RngSeed = [0; 32];
Expand Down
4 changes: 4 additions & 0 deletions chain/client/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,16 @@ pub use crate::client::{Client, ProduceChunkResult};
pub use crate::client_actions::NetworkAdversarialMessage;
pub use crate::client_actor::{start_client, ClientActor};
pub use crate::config_updater::ConfigUpdater;
pub use crate::stateless_validation::chunk_validator::orphan_witness_handling::{
HandleOrphanWitnessOutcome, MAX_ORPHAN_WITNESS_SIZE,
};
pub use crate::sync::adapter::{SyncAdapter, SyncMessage};
pub use crate::view_client::{start_view_client, ViewClientActor};
pub use near_client_primitives::debug::DebugStatus;
pub use near_network::client::{
BlockApproval, BlockResponse, ProcessTxRequest, ProcessTxResponse, SetNetworkInfo,
};
pub use stateless_validation::processing_tracker::{ProcessingDoneTracker, ProcessingDoneWaiter};

pub mod adapter;
pub mod adversarial;
Expand Down
28 changes: 28 additions & 0 deletions chain/client/src/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,34 @@ pub(crate) static CHUNK_STATE_WITNESS_TOTAL_SIZE: Lazy<HistogramVec> = Lazy::new
.unwrap()
});

pub(crate) static ORPHAN_CHUNK_STATE_WITNESSES_TOTAL_COUNT: Lazy<IntCounterVec> = Lazy::new(|| {
try_create_int_counter_vec(
"near_orphan_chunk_state_witness_total_count",
"Total number of orphaned chunk state witnesses that were saved for later processing",
&["shard_id"],
)
.unwrap()
});

pub(crate) static ORPHAN_CHUNK_STATE_WITNESS_POOL_SIZE: Lazy<IntGaugeVec> = Lazy::new(|| {
try_create_int_gauge_vec(
"near_orphan_chunk_state_witness_pool_size",
"Number of orphaned witnesses kept in OrphanStateWitnessPool (by shard_id)",
&["shard_id"],
)
.unwrap()
});

pub(crate) static ORPHAN_CHUNK_STATE_WITNESS_POOL_MEMORY_USED: Lazy<IntGaugeVec> =
Lazy::new(|| {
try_create_int_gauge_vec(
"near_orphan_chunk_state_witness_pool_memory_used",
"Memory in bytes consumed by the OrphanStateWitnessPool (by shard_id)",
&["shard_id"],
)
.unwrap()
});

pub(crate) static BLOCK_PRODUCER_ENDORSED_STAKE_RATIO: Lazy<HistogramVec> = Lazy::new(|| {
try_create_histogram_vec(
"near_block_producer_endorsed_stake_ratio",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
pub mod orphan_witness_handling;
pub mod orphan_witness_pool;

use super::processing_tracker::ProcessingDoneTracker;
use crate::stateless_validation::chunk_endorsement_tracker::ChunkEndorsementTracker;
use crate::{metrics, Client};
Expand Down Expand Up @@ -31,6 +34,7 @@ use near_primitives::types::chunk_extra::ChunkExtra;
use near_primitives::types::ShardId;
use near_primitives::validator_signer::ValidatorSigner;
use near_store::PartialStorage;
use orphan_witness_pool::OrphanStateWitnessPool;
use std::collections::HashMap;
use std::sync::Arc;

Expand All @@ -53,6 +57,7 @@ pub struct ChunkValidator {
network_sender: Sender<PeerManagerMessageRequest>,
runtime_adapter: Arc<dyn RuntimeAdapter>,
chunk_endorsement_tracker: Arc<ChunkEndorsementTracker>,
orphan_witness_pool: OrphanStateWitnessPool,
}

impl ChunkValidator {
Expand All @@ -62,13 +67,15 @@ impl ChunkValidator {
network_sender: Sender<PeerManagerMessageRequest>,
runtime_adapter: Arc<dyn RuntimeAdapter>,
chunk_endorsement_tracker: Arc<ChunkEndorsementTracker>,
orphan_witness_pool_size: usize,
) -> Self {
Self {
my_signer,
epoch_manager,
network_sender,
runtime_adapter,
chunk_endorsement_tracker,
orphan_witness_pool: OrphanStateWitnessPool::new(orphan_witness_pool_size),
}
}

Expand Down Expand Up @@ -639,14 +646,40 @@ impl Client {
witness: ChunkStateWitness,
peer_id: PeerId,
processing_done_tracker: Option<ProcessingDoneTracker>,
) -> Result<Option<ChunkStateWitness>, Error> {
) -> Result<(), Error> {
let prev_block_hash = witness.inner.chunk_header.prev_block_hash();
if self.chain.get_block(prev_block_hash).is_err() {
return Ok(Some(witness));
let prev_block = match self.chain.get_block(prev_block_hash) {
Ok(block) => block,
Err(Error::DBNotFoundErr(_)) => {
// Previous block isn't available at the moment, add this witness to the orphan pool.
self.handle_orphan_state_witness(witness)?;
return Ok(());
}
Err(err) => return Err(err),
};
self.process_chunk_state_witness_with_prev_block(
witness,
peer_id,
&prev_block,
processing_done_tracker,
)
}

pub fn process_chunk_state_witness_with_prev_block(
&mut self,
witness: ChunkStateWitness,
peer_id: PeerId,
prev_block: &Block,
processing_done_tracker: Option<ProcessingDoneTracker>,
) -> Result<(), Error> {
if witness.inner.chunk_header.prev_block_hash() != prev_block.hash() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one example where I think asserting is more appropriate than returning an error

  • the contract of this function already assumes block to correspond to witness.inner.chunk_header.prev_block_hash(), so violating that is a programmatic error, not an "invalid input" kind of error
  • having error handling here might give an impression that this could actually happened under some input, which is not true
  • returning error might mask an underlying issue and result in more time spent on debugging or completely hide it which is even worse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I'm really scared of introducing code that could lead to DoS vulnerability. It might not be a big threat to normal services, but on a blockchain I imagine that a DoS means that it'd be possible to kick out validators and cause all sorts of mayhem.
IMO passing the wrong block to process_chunk_state_witness is not a fatal error - it's invalid input, for which the function can safely fail.
I added it because I know that there are plans to process witnesses before the block is applied, and I was worried that someone might accidentally use it incorrectly once the complexity of things increases.

I don't see a reason to risk a panic here. This error will show up in the logs with an ERROR log-level, and I think anyone debugging an issue looks for those, so the visibility is there. Maybe we could introduce a BUG log-level, that would be used for non-critical bugs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to merge the PR as it is and we can discuss this on protocol core weekly sync

return Err(Error::Other(format!(
"process_chunk_state_witness_with_prev_block - prev_block doesn't match ({} != {})",
witness.inner.chunk_header.prev_block_hash(),
prev_block.hash()
)));
}

// TODO(#10265): If the previous block does not exist, we should
// queue this (similar to orphans) to retry later.
let result = self.chunk_validator.start_validating_chunk(
witness,
&self.chain,
Expand All @@ -661,6 +694,6 @@ impl Client {
},
));
}
result.map(|_| None)
result
}
}
Loading
Loading