-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster: invoke config_frontend methods on controller shard #17088
Conversation
I am not sure why this test started failing now, because I can't see any recent changes that would explain it. But based on the code it seems to me that this is how we should fix the failing test. |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46174#018e3ca9-0fed-49d5-bd21-1f96d0cc2ed3 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46190#018e3d59-3145-421a-9c59-388bd910662d |
dfc93a2
to
79fd14d
Compare
Various places in the code were calling do_patch directly without regard to the requirement that do_patch has to be called on the controller shard. This caused a fixture test to fail because it tried to invoke do_patch on all shards and this violates the assertion in do_patch of config_frontend.cc, causing it to fail with the error message "Must be called on version_shard". This fixes it by changing config_fronter::patch() to invoke do_patch on the controller shard, and moving all calls of do_patch to call patch instead.
79fd14d
to
28ed6b7
Compare
Just like patch, we also have to call do_set_next_version on the same shard because config_frontend::set_next_version might be called from a background fiber.
28ed6b7
to
ed04fc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Maybe @dotnwat has an opinion, too?
It might be worth backporting this, the change in the metrics reporter might not be inconsequential.
Makes sense. I've updated the description now to make this a backport + added a bug-fix description. |
co_return co_await do_patch(std::move(update), timeout); | ||
co_return co_await container().invoke_on( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we switching cores here to invoke do_patch
rather than requiring the caller to invoke patch
on the correct core, like in cluster::service::config_update
?
to be clear, i'm not saying it should be one way or another. but i do think it should be consistent. i'd probably choose whichever pattern aligned with a majority of the callers, and then add a comment expressing the expectation on callers (if there are any), and an assertion to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It clearly has been used incorrectly; I can't imagine a downside of making do_patch
private and dispatching to the correct core within patch
, but I may have missed something.
The primary advantage is that it makes the API harder to misuse.
As far as I can tell, requiring the caller to know which core to call it on is error prone and has no advantage.
I may be missing something, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with what Ben said above.
This also improves consistency locally, by making all of the public methods of config_frontend
callable from any shard, not just set_status
.
Globally across *_frontend.h
public methods, there is an inconsistency where some methods are callable from any shard while some aren't. I would think that since invoke_on
seems cheap, we should standardise making it the *_frontend
classes' responsibility to delegate work to the correct core. But that's a larger undertaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, allowing the method to be invoked on any core is great.
Internally we should also be consistent:
auto leader = _leaders.local().get_leader(model::controller_ntp);
if (!leader) {
co_return patch_result{
.errc = errc::no_leader_controller, .version = config_version_unset};
}
if (leader == _self) {
co_return co_await do_patch(std::move(update), timeout);
co_return co_await container().invoke_on(
here we are combining state from different cores. it's benign in this case, but in general, it should be consistent.
/backport v23.3.x |
/backport v23.2.x |
Failed to create a backport PR to v23.2.x branch. I tried:
|
Various places in the code were calling do_patch directly without regard
to the requirement that do_patch has to be called on the controller
shard.
This caused a fixture test to fail because it tried to invoke do_patch
on all shards and this violates the assertion in do_patch of
config_frontend.cc, causing it to fail with the error message "Must be
called on version_shard".
This fixes it by changing config_fronter::patch() to invoke do_patch on
the controller shard, and moving all calls of do_patch to call patch
instead.
Backports Required
Release Notes
Bug Fixes