Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Do not leak active head data in statement distribution #3519

Merged
merged 1 commit into from
Jul 26, 2021

Conversation

koute
Copy link
Contributor

@koute koute commented Jul 26, 2021

Fixes #3514.

The main problem here was that this code essentially did an early return through the ? operator on the first error it encountered, effectively ignoring everything else in activated, and also left deactivated completely untouched which resulted in permanently leaking items which were added to active_heads in the past.

I'm not entirely sure if this is the way we're supposed to handle those errors; previously the handle_subsystem_message returned an Err when get_session_index failed; now I've made it to just continue. It's not obvious to me what the right answer is here: should the whole function return an Err if it fails for only one element of activated? I'd appreciate it if someone more experienced with this code could chime in.

@koute koute requested review from eskimor and ordian July 26, 2021 08:26
@koute koute added C1-low PR touches the given topic and has a low impact on builders. A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes I3-bug Fails to follow expected behavior. I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint. labels Jul 26, 2021
Comment on lines 1889 to 1896
for deactivated in deactivated {
if active_heads.remove(&deactivated).is_some() {
tracing::trace!(
target: LOG_TARGET,
hash = ?deactivated,
"Deactivating leaf",
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify the fix by moving this section above the activated loop. This is equivalent, because we never activate and deactivate a leaf at the same time.

We can revert the rest of the changes, which makes your question obsolete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revert the rest of the changes, which makes your question obsolete.

Actually, no, let's collect all errors to a vec instead.

I'm not entirely sure if this is the way we're supposed to handle those errors;

The logic of handling errors is on the caller side depending on whether it's Fatal or not:

match result {
Ok(true) => break,
Ok(false) => {}
Err(Error(Fault::Fatal(f))) => return Err(f),
Err(Error(Fault::Err(error))) =>
tracing::debug!(target: LOG_TARGET, ?error)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revert the rest of the changes, which makes your question obsolete.

Well, not necessarily, since there's still the problem of the rest of elements of activated essentially being ignored if only one's missing. The memory leak here was definitely not intended, and it was a direct result of the early return, so it would make sense that the other elements of activated being ignored were also a mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, let's collect all errors to a vec instead.

Sounds good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not necessarily, since there's still the problem of the rest of elements of activated essentially being ignored if only one's missing.

I just realized that activated actually only consists of t most 1 head except for the startup. So the simplest fix would be just moving deactivation handling up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not necessarily, since there's still the problem of the rest of elements of activated essentially being ignored if only one's missing.

I just realized that activated actually only consists of t most 1 head except for the startup. So the simplest fix would be just moving deactivation handling up.

So just for my own reference - are you saying that it's fine to ignore rest of the elements here, or that there won't ever be any extra elements?

(If it's the latter then it'd be nice to add a debug_assert! here.)

Copy link
Member

@ordian ordian Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we create active leaves only on block import (1 activated head):

async fn block_imported(&mut self, block: BlockInfo) -> SubsystemResult<()> {
match self.active_leaves.entry(block.hash) {
hash_map::Entry::Vacant(entry) => entry.insert(block.number),
hash_map::Entry::Occupied(entry) => {
debug_assert_eq!(*entry.get(), block.number);
return Ok(());
}
};
let mut update = match self.on_head_activated(&block.hash, Some(block.parent_hash)) {
Some((span, status)) => ActiveLeavesUpdate::start_work(ActivatedLeaf {
hash: block.hash,
number: block.number,
status,
span
}),
None => ActiveLeavesUpdate::default(),
};
if let Some(number) = self.active_leaves.remove(&block.parent_hash) {
debug_assert_eq!(block.number.saturating_sub(1), number);
update.deactivated.push(block.parent_hash);
self.on_head_deactivated(&block.parent_hash);
}
self.clean_up_external_listeners();
if !update.is_empty() {
self.broadcast_signal(OverseerSignal::ActiveLeaves(update)).await?;
}
Ok(())
}
and on startup only once
let active_leaves = futures::executor::block_on(
active_leaves(&select_chain, &*client)
)?;

(If it's the latter then it'd be nice to add a debug_assert! here.)

We can change the signature to of activated instead to be a single entry. (But that's out of scope of the PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just for my own reference - are you saying that it's fine to ignore rest of the elements here, or that there won't ever be any extra elements?

It's fine to ignore extra elements on startup for statement-distribution and there won't ever be any extra elements after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've moved the deallocation to the top.

I'm still not exactly comfortable with leaving the loop below as-is; if there's going to be only one element then that definitely should be enforced by the type system, but as you've said that is out of the scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll handle that as part of #2812.

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for identifying the issue and bearing with me!

@ordian ordian merged commit 6dfdeab into paritytech:master Jul 26, 2021
ordian added a commit that referenced this pull request Jul 26, 2021
* master:
  enable disputes (#3478)
  Do not leak active head data in statement distribution (#3519)
  Fix TransactAsset Implementation (#3345)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. I3-bug Fails to follow expected behavior. I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in StatementDistribution::handle_subsystem_message
2 participants