-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Large Shard Count Scalability Issues #77466
Labels
>bug
:Data Management/ILM+SLM
Index and Snapshot lifecycle management
:Distributed Indexing/Distributed
A catch all label for anything in the Distributed Area. Please avoid if you can.
Meta
release highlight
:Security/Authorization
Roles, Privileges, DLS/FLS, RBAC/ABAC
Team:Data Management
Meta label for data/management team
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Team:Security
Meta label for security team
Comments
original-brownbear
added
>bug
Meta
:Distributed Indexing/Distributed
A catch all label for anything in the Distributed Area. Please avoid if you can.
:Security/Authorization
Roles, Privileges, DLS/FLS, RBAC/ABAC
:Data Management/Other
labels
Sep 9, 2021
elasticmachine
added
Team:Security
Meta label for security team
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Team:Data Management
Meta label for data/management team
labels
Sep 9, 2021
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-security (Team:Security) |
Pinging @elastic/es-data-management (Team:Data Management) |
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Sep 16, 2021
This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to elastic#77466
martijnvg
added a commit
that referenced
this issue
Sep 16, 2021
This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to #77466
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Sep 16, 2021
This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to elastic#77466
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Sep 16, 2021
…hen running a policy. Sometimes the parsing done by `getCurrentStep()` method is unnecessary, because the method calling the `getCurrentStep()` method has already parsed a `LifecycleExecutionState` instance and can just provide that. Relates to elastic#77466
elasticsearchmachine
pushed a commit
that referenced
this issue
Sep 16, 2021
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Sep 16, 2021
…hen running a policy. (elastic#77863) Sometimes the parsing done by `getCurrentStep()` method is unnecessary, because the method calling the `getCurrentStep()` method has already parsed a `LifecycleExecutionState` instance and can just provide that. Relates to elastic#77466
original-brownbear
added a commit
that referenced
this issue
Sep 29, 2021
Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around #78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well. Relates #77466 Closes #78246
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Sep 29, 2021
…ic#78390) Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around elastic#78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well. Relates elastic#77466 Closes elastic#78246
original-brownbear
added a commit
that referenced
this issue
Oct 5, 2022
This can take O(10s) for tens of thousands of shards, we have to fork it. relates #77466
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 5, 2022
…ic#90651) This can take O(10s) for tens of thousands of shards, we have to fork it. relates elastic#77466
elasticsearchmachine
pushed a commit
that referenced
this issue
Oct 5, 2022
This was referenced Oct 8, 2022
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Jan 6, 2023
BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It eagerly sends one request for every target shard in a tight loop on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - It avoids allocating a list of all target shards, instead iterating over the target indices and generating shard IDs on the fly. - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The per-shard requests are now throttled, with a meaningful yet fairly generous concurrency limit of `#(data nodes) * 10`. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates elastic#77466
This was referenced Jan 6, 2023
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Jan 13, 2023
BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates elastic#77466 Relates elastic#92729
elasticsearchmachine
pushed a commit
that referenced
this issue
Jan 13, 2023
BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates #77466 Relates #92729
2 tasks
3 tasks
40 tasks
dakrone
added
:Data Management/ILM+SLM
Index and Snapshot lifecycle management
and removed
:Data Management/Other
labels
Nov 16, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Data Management/ILM+SLM
Index and Snapshot lifecycle management
:Distributed Indexing/Distributed
A catch all label for anything in the Distributed Area. Please avoid if you can.
Meta
release highlight
:Security/Authorization
Roles, Privileges, DLS/FLS, RBAC/ABAC
Team:Data Management
Meta label for data/management team
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Team:Security
Meta label for security team
This meta issue tracks known issues with scaling clusters to large numbers of shards.
Security
General
RestClusterGetSettingsAction
Requests the Full Metadata from Master #82342ClusterStateHealth#indices
when unnecessary #90631TransportMasterNodeReadAction
#101805Snapshots + SLM
SnapshotsInProgress
diffable #88732Metrics
ILM + Allocation
LifecycleExecutionState
is parsed when running a policy #77863DiscoveryNode#trimTier
Result #80179MetadataIndexAliasesService
submits unbatched tasks at URGENT priority #89924Search
Network
The text was updated successfully, but these errors were encountered: