-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use general cluster state batching mechanism for shard failures #15016
Conversation
@bleskes I'll rebase this pull request on master when #14899 is reintegrated there. The salient commit for this review is thus ccc89c3666780d0bf7b2425f9e8c76dbe6316187 pending #14899 (all the changes for that commit are in ShardStateAction.java and some minor modifications in AllocationService.java). |
This commit modifies the handling of shard failure cluster state updates to use the general cluster state batching mechanism. An advantage of this approach is we now get correct per-listener notification on failures.
try { | ||
RoutingAllocation.Result result = allocationService.applyFailedShard( | ||
currentState, | ||
new FailedRerouteAllocation.FailedShard(task.shardRouting, task.message, task.failure)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we loose the batch application of shard failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was to get task failures only for the shards for which we were unsuccessful in marking as failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we share some logic when we do it in a batch, most notably the reroute. I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)
That's what I'm thinking now, but it should be in a separate issue on the #14252 work I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 413688b to apply the failures in a single batch. For now, we will not get task-specific failures but that will come in a follow up.
looking good. Left some comments. |
BatchResult.Builder<ShardRoutingEntry> builder = BatchResult.builder(); | ||
List<FailedRerouteAllocation.FailedShard> shardRoutingsToBeApplied = new ArrayList<>(tasks.size()); | ||
for (ShardRoutingEntry task : tasks) { | ||
task.processed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this processed with this change..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in b58d82f.
LGTM. Left trivial comments. |
Use general cluster state batching mechanism for shard failures
One note - the comment from the shard started pr about error reporting holds for this one as well. Is the plan to do a followup? As with the shard started, we can just let the exception bubble up |
@bleskes Yeah. :) |
I opened #15428. |
This commit modifies the handling of shard failure cluster state updates
to use the general cluster state batching mechanism. An advantage of
this approach is we now get correct per-listener notification on
failures.
Relates #14899, relates #14725