Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: fix reactor stalls during shutdown #5151

Merged
merged 3 commits into from
Jul 4, 2022

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jun 16, 2022

Cover letter

These objects are all potentially very large, so:

  • Must not destruct them in one shot (the overhead
    of all the item destructors is enough to cause
    an issue)
  • Must not use ss::parallel_for_each, it's unsafe
    on large collections.

Release notes

  • none

co_await ss::parallel_for_each(
partitions, [this](auto& e) { return do_shutdown(e.second); });
co_await ss::max_concurrent_for_each(
partitions, 1024, [this](auto& e) { return do_shutdown(e.second); });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1024 is an arbitrary number, but this feels like something that isn't worth creating a full configuration property for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hot! i like this. cc: @travisdowns

@jcsp jcsp marked this pull request as ready for review June 17, 2022 15:41
{
auto current = _raft_table.begin();
while (current != _raft_table.end()) {
current = _raft_table.erase(current, ++current);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe?

erase(current, ++current) both modifies current and uses it in another argument which I believe is indeterminately sequenced (this changed in C++17 from being UB to "indeterminately sequenced" but that stronger guarantee isn't very useful here). That is you might end up with (current + 1), (current + 1) (empty range) or (current), (current + 1) (what you want).

In any case why use this range erase overload at all over simply _raft_table.erase(current++)?

@@ -162,10 +163,26 @@ ss::future<> partition_manager::stop_partitions() {
co_await _gate.close();
// prevent partitions from being accessed
auto partitions = std::exchange(_ntp_table, {});
_raft_table.clear();

{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to break up the destructor by basically putting a yield point between destroying every element, right?

We use this 2x here already and maybe elsewhere, could be worth wrapping it up in a utility function?

auto current = _raft_table.begin();
while (current != _raft_table.end()) {
current = _raft_table.erase(current, ++current);
co_await ss::coroutine::maybe_yield();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the order-of-magnitude cost of calling this for every element is. If we had a helper it could do N elements before doing a yield (though you'd need an estimate from the call about how expensive every deletion is to get N right).

Comment on lines 180 to 184
auto current = partitions.begin();
while (current != partitions.end()) {
current = partitions.erase(current, ++current);
co_await ss::coroutine::maybe_yield();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about current and ++current and erase return value seems overly complicated?

while (!partitions.empty()) {
  partitions.erase(partitions.begin());
  co_await ss::coroutine::maybe_yield();
}

@jcsp jcsp force-pushed the controller-stalls branch from ebbb2ee to 49fb8d4 Compare July 4, 2022 12:29
@jcsp jcsp requested review from LenaAn and BenPope as code owners July 4, 2022 12:29
@jcsp
Copy link
Contributor Author

jcsp commented Jul 4, 2022

Revised this to create an async_clear helper (for flat_hash_map) that avoids the repetition -- this is basically revisiting #4860 now that we have a few more usage sites, whereas at the time of that PR we were only using the helper one place.

Calling maybe_yield every iteration wasn't very expensive (it was only checking a boolean for whether any other tasks were waiting), but we can call it less often by batching up our erases into ranges + only calling maybe_yield for each range we erase. This also gives the underlying container a chance to apply any efficiencies they may have for bulk erases vs. single element erases.

src/v/ssx/async-clear.h Outdated Show resolved Hide resolved
src/v/ssx/async-clear.h Outdated Show resolved Hide resolved
jcsp added 3 commits July 4, 2022 16:02
This is for clearing large containers without
causing reactor stalls.
These objects are all potentially very large, so:
- Must not destruct them in one shot (the overhead
  of all the item destructors is enough to cause
  an issue)
- Must not use ss::parallel_for_each, it's unsafe
  on large collections.
@jcsp jcsp force-pushed the controller-stalls branch from 49fb8d4 to 3f34d6e Compare July 4, 2022 15:03
@jcsp
Copy link
Contributor Author

jcsp commented Jul 4, 2022

This ran into the llvm templates+coroutines bug llvm/llvm-project#49689, so I've wrapped the async_clear helper into a class to work around that (seems to work when compiling locally at least)

Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jcsp jcsp merged commit 7d762f7 into redpanda-data:dev Jul 4, 2022
@jcsp jcsp deleted the controller-stalls branch July 4, 2022 20:00
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mmedenjak mmedenjak added kind/enhance New feature or request performance labels Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants