-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ssx::sharded_abort_source
using std::shared_ptr
#16534
base: dev
Are you sure you want to change the base?
Conversation
src/v/ssx/abort_source.h
Outdated
} | ||
|
||
// Subscribes to the parent abort_source. | ||
ss::future<> start(ss::abort_source& parent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: this method can return void
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(so it could probably all be the constructor, which avoids other problems like calling start twice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was keeping it as ss::future<>
to reduce the number of code changes introduced by this PR. Though being able to move everything to the constructor would be nice and avoid some pitfalls. Let me give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the start
method and moved everything to the constructor.
std::optional<ss::abort_source::subscription> _sub; | ||
struct sharded_abort_source_internal { | ||
ss::shard_id orig_shard; | ||
std::vector<ss::abort_source> as; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought much yet about the general strategy of this PR.
But if we end up going with this general approach, would it be worth allocating these abort sources so that they avoid false sharing, which presumably is happening each core is indexing into its own slot in this vector?
Taking things further you could probably roll your own reference counting that either avoided the atomics (using x-core messages) or used some relaxed / sloppy counting techniques, but it might be overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which presumably is happening each core is indexing into its own slot in this vector?
False sharing requires mutation: if two (or more) threads access objects on the same cache line but those accesses are all only reads the cache line is just shared out among all the cores which is an efficient and common pattern.
So there could be some false sharing here, but only when the source is actually aborted, which is already a fairly expensive operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but only when the source is actually aborted
ahh, thanks. of course!
src/v/ssx/abort_source.h
Outdated
* A sharded version of `ss::abort_source` that allows any shard to request an | ||
* abort on all other shards. | ||
* | ||
* Note that is class's members are wrapped in a shared_ptr. So the object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to comment on this but I think it could be a bit clearer, maybe something along the lines of:
Sharing one instance of this class across threads raises difficult lifetime and thread safety issues, so the intended usage pattern for this class is to create it on one shard, then make a copy of this original object on every shard where it will be used. All internal state is wrapped in a shared pointer so these copies reference the same underlying state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, switched to the recommended comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oblig comment
The previous implementation of `ssx::sharded_abort_source` had two potential race conditions; - If it is stopped then `.local()` is no longer safe to call since `.stop()` will delete all local objects. However, other shards do not have a way of knowing if the abort_source was stopped, hence no way of knowing if `.local()` is safe to call or not. The new implementation avoids this issue by using a shared_ptr to delete the local objects rather than `.stop()`. - If the parent abort_source aborts then the subscription will make cross shard calls to every shard-local abort_source. However, on the original shard the `ssx::sharded_abort_source` may be stopped and the shard-local abort_source's deleted before the cross shard calls finish. Resulting in a use-after-free violation. The new implementation fixes this in the same way as it did the last issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Any reason not to use ss::optimized_optional
?
--- a/src/v/ssx/abort_source.h
+++ b/src/v/ssx/abort_source.h
@@ -39,15 +39,12 @@ public:
std::vector<ss::abort_source>(ss::smp::count),
std::nullopt)) {
auto dex = parent.get_default_exception();
- auto sub = parent.subscribe(
+ _internal->sub = parent.subscribe(
[as = *this,
dex](std::optional<std::exception_ptr> const& ex) mutable noexcept {
dex = ex.value_or(dex);
ssx::background = as.request_abort_ex(dex);
});
- if (sub) {
- _internal->sub.emplace(std::move(*sub));
- }
}
// Returns a reference to an abort_source local to the calling shard
@@ -102,7 +99,7 @@ public:
ss::this_shard_id() == _internal->orig_shard,
"sharded_abort_source must be stopped on its original shard");
- _internal->sub.reset();
+ _internal->sub = std::nullopt;
return request_abort();
}
@@ -110,7 +107,7 @@ private:
struct sharded_abort_source_internal {
ss::shard_id orig_shard;
std::vector<ss::abort_source> as;
- std::optional<ss::abort_source::subscription> sub;
+ ss::optimized_optional<ss::abort_source::subscription> sub;
};
std::shared_ptr<sharded_abort_source_internal> _internal;```
The current implementation of
ssx::sharded_abort_source
has two potential race conditions;If it is stopped then
.local()
is no longer safe to call since.stop()
will delete all local objects. However, other shards do not have a way of knowing if the abort_source was stopped, hence no way of knowing if.local()
is safe to call or not. The new implementation avoids this issue by using a shared_ptr to delete the local objects rather than.stop()
.If the parent abort_source aborts then the subscription will make cross shard calls to every shard-local abort_source. However, on the original shard the
ssx::sharded_abort_source
may be stopped and the shard-local abort_source's deleted before the cross shard calls finish. Resulting in a use-after-free violation. The new implementation fixes this in the same way as it did the last issue.May fix #14149
Backports Required
Release Notes