Improve scalability of BroadcastReplicationActions #92902

DaveCTurner · 2023-01-13T11:03:05Z

BroadcastReplicationAction derivatives (POST /<indices>/_refresh and POST /<indices>/_flush) are pretty inefficient when targeting high shard counts due to how TransportBroadcastReplicationAction works:

It computes the list of all target shards up-front on the calling (transport) thread.
It accumulates responses in a CopyOnWriteArrayList which takes quadratic work to populate, even though nothing reads this list until it's fully populated.
It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures.
Each failure is wrapped up in a ReplicationResponse.ShardInfo.Failure but then unwrapped at the end to be re-wrapped in a DefaultShardOperationFailedException.

This commit fixes all this:

The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (REFRESH or FLUSH) rather than a transport thread.
The failures are tracked in a regular ArrayList, avoiding the accidentally-quadratic complexity.
Rather than accumulating the full responses for later processing we track the counts and failures directly.
The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end.

Relates #77466
Relates #92729

BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates elastic#77466 Relates elastic#92729

elasticsearchmachine · 2023-01-13T11:03:35Z

Pinging @elastic/es-distributed (Team:Distributed)

tlrx

LGTM

tlrx · 2023-01-13T15:21:35Z

...n/java/org/elasticsearch/action/support/replication/TransportBroadcastReplicationAction.java

+                    addShardResponse(numCopies, 0, createSyntheticFailures(numCopies, e));
+                }
+
+                private List<DefaultShardOperationFailedException> createSyntheticFailures(int numCopies, Exception e) {


I'm not sure if it really deserves a dedicated method, we can probably include this in onFailure()

Yes, this was better at some point in the process but no longer needed indeed.

…13-TransportBroadcastReplicationAction

DaveCTurner · 2023-01-13T16:22:07Z

@elasticmachine please run elasticsearch-ci/bwc

DaveCTurner added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >refactoring v8.7.0 labels Jan 13, 2023

DaveCTurner mentioned this pull request Jan 13, 2023

Improve scalability of BroadcastReplicationActions #92729

Closed

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 13, 2023

DaveCTurner mentioned this pull request Jan 13, 2023

Fix Large Shard Count Scalability Issues #77466

Open

97 tasks

DaveCTurner requested a review from tlrx January 13, 2023 12:13

tlrx approved these changes Jan 13, 2023

View reviewed changes

DaveCTurner added 2 commits January 13, 2023 15:31

Merge branch 'main' of github.com:elastic/elasticsearch into 2023-01-…

d045ac0

…13-TransportBroadcastReplicationAction

Inline createSyntheticFailures

05821cb

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 13, 2023

elasticsearchmachine merged commit 4aa4a0d into elastic:main Jan 13, 2023

DaveCTurner deleted the 2023-01-13-TransportBroadcastReplicationAction branch January 13, 2023 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve scalability of BroadcastReplicationActions #92902

Improve scalability of BroadcastReplicationActions #92902

DaveCTurner commented Jan 13, 2023

elasticsearchmachine commented Jan 13, 2023

tlrx left a comment

tlrx Jan 13, 2023

DaveCTurner Jan 13, 2023

DaveCTurner commented Jan 13, 2023

Improve scalability of BroadcastReplicationActions #92902

Improve scalability of BroadcastReplicationActions #92902

Conversation

DaveCTurner commented Jan 13, 2023

elasticsearchmachine commented Jan 13, 2023

tlrx left a comment

Choose a reason for hiding this comment

tlrx Jan 13, 2023

Choose a reason for hiding this comment

DaveCTurner Jan 13, 2023

Choose a reason for hiding this comment

DaveCTurner commented Jan 13, 2023