Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ssx::sharded_abort_source using std::shared_ptr #16534

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

ballard26
Copy link
Contributor

@ballard26 ballard26 commented Feb 8, 2024

The current implementation of ssx::sharded_abort_source has two potential race conditions;

  • If it is stopped then .local() is no longer safe to call since .stop() will delete all local objects. However, other shards do not have a way of knowing if the abort_source was stopped, hence no way of knowing if .local() is safe to call or not. The new implementation avoids this issue by using a shared_ptr to delete the local objects rather than .stop().

  • If the parent abort_source aborts then the subscription will make cross shard calls to every shard-local abort_source. However, on the original shard the ssx::sharded_abort_source may be stopped and the shard-local abort_source's deleted before the cross shard calls finish. Resulting in a use-after-free violation. The new implementation fixes this in the same way as it did the last issue.

May fix #14149

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

}

// Subscribes to the parent abort_source.
ss::future<> start(ss::abort_source& parent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this method can return void now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so it could probably all be the constructor, which avoids other problems like calling start twice)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was keeping it as ss::future<> to reduce the number of code changes introduced by this PR. Though being able to move everything to the constructor would be nice and avoid some pitfalls. Let me give it a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the start method and moved everything to the constructor.

std::optional<ss::abort_source::subscription> _sub;
struct sharded_abort_source_internal {
ss::shard_id orig_shard;
std::vector<ss::abort_source> as;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought much yet about the general strategy of this PR.

But if we end up going with this general approach, would it be worth allocating these abort sources so that they avoid false sharing, which presumably is happening each core is indexing into its own slot in this vector?

Taking things further you could probably roll your own reference counting that either avoided the atomics (using x-core messages) or used some relaxed / sloppy counting techniques, but it might be overkill.

Copy link
Member

@travisdowns travisdowns Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which presumably is happening each core is indexing into its own slot in this vector?

False sharing requires mutation: if two (or more) threads access objects on the same cache line but those accesses are all only reads the cache line is just shared out among all the cores which is an efficient and common pattern.

So there could be some false sharing here, but only when the source is actually aborted, which is already a fairly expensive operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but only when the source is actually aborted

ahh, thanks. of course!

* A sharded version of `ss::abort_source` that allows any shard to request an
* abort on all other shards.
*
* Note that is class's members are wrapped in a shared_ptr. So the object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to comment on this but I think it could be a bit clearer, maybe something along the lines of:

Sharing one instance of this class across threads raises difficult lifetime and thread safety issues, so the intended usage pattern for this class is to create it on one shard, then make a copy of this original object on every shard where it will be used. All internal state is wrapped in a shared pointer so these copies reference the same underlying state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, switched to the recommended comment.

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oblig comment

The previous implementation of `ssx::sharded_abort_source` had two
potential race conditions;

- If it is stopped then `.local()` is no longer safe to call since
  `.stop()` will delete all local objects. However, other shards do not
have a way of knowing if the abort_source was stopped, hence no way of
knowing if `.local()` is safe to call or not. The new implementation
avoids this issue by using a shared_ptr to delete the local objects
rather than `.stop()`.

- If the parent abort_source aborts then the subscription will make
  cross shard calls to every shard-local abort_source. However, on the
original shard the `ssx::sharded_abort_source` may be stopped and the
shard-local abort_source's deleted before the cross shard calls finish.
Resulting in a use-after-free violation. The new implementation fixes
this in the same way as it did the last issue.
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Any reason not to use ss::optimized_optional?

--- a/src/v/ssx/abort_source.h
+++ b/src/v/ssx/abort_source.h
@@ -39,15 +39,12 @@ public:
         std::vector<ss::abort_source>(ss::smp::count),
         std::nullopt)) {
         auto dex = parent.get_default_exception();
-        auto sub = parent.subscribe(
+        _internal->sub = parent.subscribe(
           [as = *this,
            dex](std::optional<std::exception_ptr> const& ex) mutable noexcept {
               dex = ex.value_or(dex);
               ssx::background = as.request_abort_ex(dex);
           });
-        if (sub) {
-            _internal->sub.emplace(std::move(*sub));
-        }
     }
 
     // Returns a reference to an abort_source local to the calling shard
@@ -102,7 +99,7 @@ public:
           ss::this_shard_id() == _internal->orig_shard,
           "sharded_abort_source must be stopped on its original shard");
 
-        _internal->sub.reset();
+        _internal->sub = std::nullopt;
         return request_abort();
     }
 
@@ -110,7 +107,7 @@ private:
     struct sharded_abort_source_internal {
         ss::shard_id orig_shard;
         std::vector<ss::abort_source> as;
-        std::optional<ss::abort_source::subscription> sub;
+        ss::optimized_optional<ss::abort_source::subscription> sub;
     };
 
     std::shared_ptr<sharded_abort_source_internal> _internal;```

@ballard26 ballard26 marked this pull request as draft July 19, 2024 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants