Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stop background threads between estimations #7689

Closed
wants to merge 5 commits into from

Conversation

jakmeier
Copy link
Contributor

Explicitly stop and wait for prefetching background threads to terminate
when a testbed is dropped. This avoids that estimations are influenced
by background threads left over from previous estimations, which we have
observed since merging #7661.

Explicitly stop and wait for prefetching background threads to terminate
when a testbed is dropped. This avoids that estimations are influenced
by background threads left over from previous estimations, which we have
observed since merging near#7661.
@jakmeier jakmeier requested a review from a team as a code owner September 26, 2022 12:30
@jakmeier jakmeier added the A-params-estimator Area: runtime params estimator label Sep 26, 2022

impl<'a> Drop for Testbed<'a> {
fn drop(&mut self) {
self.inner.stop_prefetching_threads();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's propagate this drop to ShardTriesInner? Joining thread on drop should be responsibility of the entity which spawns the threads. As a rule-of-thumb, every spawn thread should have its' join.

Perhaps also use something like

struct ShardTriesInner {
    /// Prefetcher state, such as IO threads, per shard.
    prefetchers: RwLock<HashMap<ShardUId, (PrefetchApi, Vec<JoinHandle<()>>)>>,
}

?

The funky Clone impl is funky, better if we can separate Prefetcher and PrefetcherHandle at the type level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, 100% agree that this clone thing is too funky, thanks for pointing it out.

However, ShardTriesInner::drop seems like the wrong place to me. At least I failed in my attempt to make it work.

The problem is that we really must close the crossbeam channel first and only then join the threads. Joining when dropping Testbed only works because the testbed itself outlives all places that could hold a clone of a channel sender, which is not true for ShardTriesInner.

The sender is stored inside PrefetchApi, which in turn is created by ShardTriesInner. PrefetchApi is cloned around into every instance of Trie and that lives in various other structs, such as TrieUpdate. And unlike Testbed, ShardTriesInner is not guaranteed to outlive all of those. Thus, joining when ShardTriesInner is dropped results in deadlocks as the background threads can still be waiting on an open channel.

I have now attempted to apply your suggestion in a slightly different way. Definitive ownership of the channel sender AND the join handles is now given to the clonable struct WorkQueue, which itself is stored inside PrefetchApi. This way all clones of the sender also clone the Arc<Vec<JoinHandle>>. When the last instance that combination is dropped, it is safe to join the threads.

I am still not 100% happy with my implementation, though. It is still gimmicky and there are too many nested structs for my taste. But it's the truest representation of ownership that I could come up with right now.^^ Ideas for improvements are welcome. :)

The potential clones of `PrefetchApi` is unknown by its initial creator,
the `InnerShardTries` instance. But the last channel sender must be
dropped before joining the threads. Therefore, it is tricky to find the
right place to join background threads.

To solve it locally inside
`core/store/src/trie/prefetching_trie_storage.rs` we use a helper struct
`JoinGuard`. Dropping the join guard joins all threads.
It is stored inside a reference counted pointer right after the
the crossbeam sender, such that they are always cloned together.
This ensures the join guard outlives the last sender to the channel.
@jakmeier
Copy link
Contributor Author

I had a call with matklad just now. Plan of action is to create a second channel of some kind for actively shutting down IO threads and call those in the impl Drop of ShardTries. Anyone still holding on to PrefetchApi at this point will not be able to send prefetch requests, which seems okay.

@jakmeier jakmeier closed this Sep 28, 2022
@jakmeier jakmeier deleted the fix-estimator-bkg-threads branch September 28, 2022 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-params-estimator Area: runtime params estimator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants