Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scalability of BroadcastReplicationActions #92902

Conversation

DaveCTurner
Copy link
Contributor

BroadcastReplicationAction derivatives (POST /<indices>/_refresh and POST /<indices>/_flush) are pretty inefficient when targeting high shard counts due to how TransportBroadcastReplicationAction works:

  • It computes the list of all target shards up-front on the calling (transport) thread.

  • It accumulates responses in a CopyOnWriteArrayList which takes quadratic work to populate, even though nothing reads this list until it's fully populated.

  • It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures.

  • Each failure is wrapped up in a ReplicationResponse.ShardInfo.Failure but then unwrapped at the end to be re-wrapped in a DefaultShardOperationFailedException.

This commit fixes all this:

  • The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (REFRESH or FLUSH) rather than a transport thread.

  • The failures are tracked in a regular ArrayList, avoiding the accidentally-quadratic complexity.

  • Rather than accumulating the full responses for later processing we track the counts and failures directly.

  • The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end.

Relates #77466
Relates #92729

BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and
`POST /<indices>/_flush`) are pretty inefficient when targeting high
shard counts due to how `TransportBroadcastReplicationAction` works:

- It computes the list of all target shards up-front on the calling
  (transport) thread.

- It accumulates responses in a `CopyOnWriteArrayList` which takes
  quadratic work to populate, even though nothing reads this list until
it's fully populated.

- It then mostly discards the accumulated responses, keeping only the
  total number of shards, the number of successful shards, and a list of
any failures.

- Each failure is wrapped up in a
  `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end
to be re-wrapped in a `DefaultShardOperationFailedException`.

This commit fixes all this:

- The computation of the list of shards, and the sending of the
  per-shard requests, now happens on the relevant threadpool (`REFRESH`
or `FLUSH`) rather than a transport thread.

- The failures are tracked in a regular `ArrayList`, avoiding the
  accidentally-quadratic complexity.

- Rather than accumulating the full responses for later processing we
  track the counts and failures directly.

- The failures are tracked in their final form, skipping the
  unwrap-and-rewrap step at the end.

Relates elastic#77466
Relates elastic#92729
@DaveCTurner DaveCTurner added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >refactoring v8.7.0 labels Jan 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 13, 2023
@DaveCTurner DaveCTurner requested a review from tlrx January 13, 2023 12:13
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

addShardResponse(numCopies, 0, createSyntheticFailures(numCopies, e));
}

private List<DefaultShardOperationFailedException> createSyntheticFailures(int numCopies, Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it really deserves a dedicated method, we can probably include this in onFailure()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was better at some point in the process but no longer needed indeed.

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 13, 2023
@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/bwc

@elasticsearchmachine elasticsearchmachine merged commit 4aa4a0d into elastic:main Jan 13, 2023
@DaveCTurner DaveCTurner deleted the 2023-01-13-TransportBroadcastReplicationAction branch January 13, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >refactoring Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants