Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshot] [Design Proposal] Shard Allocator for Remote Search Shards #4759

Closed
Tracked by #2919
kotwanikunal opened this issue Oct 12, 2022 · 1 comment
Closed
Tracked by #2919
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request v2.4.0 'Issues and PRs related to version v2.4.0'

Comments

@kotwanikunal
Copy link
Member

kotwanikunal commented Oct 12, 2022

This document outlines a proposal for implementing a new allocator to enable searchable snapshots using remote search shards. Please feel free to leave comments below with any thoughts or suggestions!

Overview

Searchable Snapshots for OpenSearch will introduce support for remote search shards which will enable search for snapshots within the repository without downloading all the data onto the nodes. These remote search shards will be assigned to a node which has the new search role - which provides the capability to perform data caching on configured storage when remotely searching the snapshots.

Problem Statement

Currently, OpenSearch allocation and initialization works solely for shards which are located on the node’s filesystem. The shard placement strategy can be broken into two smaller subproblems: which shard to act on, and which target node to place it at. The default OpenSearch implementation, BalancedShardsAllocator, divides its responsibilities into three major code paths:

  • allocate unassigned shards,
  • move shards
  • rebalance shards

The allocation mechanism takes into picture the shard size, disk thresholds and other attributes which contribute to the weighted balancing of shards and index across the nodes within the cluster.

The remote search shards for searchable snapshots will rely on the data cache which will be configured as a part of search nodes (nodes with the role search assigned to them). The allocation process should take into account these remote shards while assigning, moving or balancing them across clusters such that they are only assigned to nodes with the specific search role.
Additionally, in order to incorporate the remote nature of shards, the balancing logic has to be updated for both the remote and local shards since the requirements in terms of disk, shard count and other variables that are used for weight based calculation will differ for these shards.

Requirements

  1. Provide the mechanism to allocate remote search shards for searchable snapshots to nodes with a search role.
  2. Incorporate remote search shard balancing and shard movement functionality into the cluster management for traditional local shard management while minimizing any deviations to the existing functionality.

Terminology

  1. ShardsAllocator: The interface which acts as entry point for shard allocation on nodes in the cluster.
  2. BalancedShardsAllocator: The default implementation of the ShardsAllocator interface which re-balances the nodes allocations within a cluster based on a weight function.
  3. Weight Function: The mathematical function based on parameters like index balance and shard balance which helps with the balancing decision of shards amongst nodes within the cluster.
  4. AllocationDecider: An abstract base class that allows to make dynamic index-wide or cluster-wide allocation decisions on a per-node basis.
  5. RoutingAllocation: An object which keeps the state of current allocation of shards and holds the AllocationDeciders for decision making.
  6. Balancer: The class which utilizes the WeightFunction and the current allocation to assign any unassigned shards, move incompatible shards and rebalance the cluster to the list of available RoutingNodes within the cluster.
  7. Remote Searcher Node: A node with the search role capability.
  8. Local Shard: A traditional type of a shard which is stored on the file-system of the node within the cluster.
  9. Remote Search Shard: A new type of read-only shard which is part-fetched at query run time, where the parts are a part of a larger, shared data cache mechanism on the node.

Current State

High Level Overview

A high level overview of the current allocation process, assuming the default OpenSearch implementation, is outlined below.

  • AllocationService receives a reroute request through one of it’s multiple method signatures, which in case of searchable snapshots, happens after the metadata restore is complete.. This request is eventually forwarded to a ShardsAllocator implementation - in the default scenario - the BalancedShardsAllocator with the current RoutingAllocation, which is mutable.
  • The request is handled by the ShardsAllocator::allocate method implementation. BalancedShardsAllocator initializes a WeightFunction as well as a Balancer and takes care of three things -
    • Allocating unassigned nodes
    • Ensuring ineligible shard movement
    • Balancing the cluster
  • The Balancer consumes the weight function and allocation as the input and exposes three methods to perform the above steps -
    • Balancer::allocateUnassigned()
    • Balancer::moveShards()
    • Balancer::balance()
  • These methods utilize the weight function and comparators to determined the best nodes for unassigned shards, moving ineligible shards and balancing out the cluster to an optimal state.
  • During this allocation process, the set of eligible nodes and affected shards are found using AllocationDeciders which contains a collection of AllocationDecider implementations. The decider helps the balancer by answering questions like canAllocate, canRebalance, canRemain, canMoveAway for shard-level and index-level decisions on a per-node basis.

ShardsAllocator-Page-2

Proposed Solution

High Level Overview

The proposed solution will involve introducing a completely independent balancer which will handle the management for remote search shards. Following is a high level overview of the solution -

  • The default implementation of ShardsAllocator - BalancedShardsAllocator will continue to function as-is as the default handler for the allocation operations within OpenSearch, including both the local and remote search shard operations.
  • A new ShardsBalancer interface will be exposed with the following signature -
    ShardsAllocator-Page-1
public interface ShardsBalancer {

    void allocateUnassigned();

    void moveShards();

    void balance();
}
  • The current implementation of Balancer within BalancedShardsAllocator will implement the above signature and will be refactored into a separate, new class called LocalShardsBalancer
    • This refactored class will have the same functionality as the current Balancer class for local shards
    • The only changes to the implementation will include filtering out remote shards before performing any operations to ensure independent operation
  • The ShardsBalancer interface described above will be used to implement another balancer - RemoteShardsBalancer
    • RemoteShardsBalancer will be responsible solely for handling management for remote search shards, initially for the searchable snapshot operations
    • This implementation will filter out local shards and process remote indices, similar to the filtering used by the proposed LocalShardsBalancer and perform the defined operations (allocateUnassigned, moveShards, balance) on the remote nodes.
  • Both this balancers will be instantiated within BalancedShardsAllocator and called in sequence. For example -
void allocate() {
    ...
    localShardsBalancer.allocateUnassigned()
    localShardsBalancer.move()
    localShardsBalancer.balance()
    remoteShardsBalancer.allocateUnassigned()
    remoteShardsBalancer.move()
    remoteShardsBalancer.balance()
    ...
}
  • To enable correct decision making within the balancers, the implementations for the AllocationDecider(s) will be updated to accommodate for the new remote search shard type. New AllocationDecider implementations will also be added to check the node capabilities which can decide if a remote search shard can be assigned to a particular node.

ShardsAllocator-Page-3

Pros:

  • Keeps it simple - easier to manage, evolve the experimental code.
  • Independent and isolated mode of operation for for the balancers reducing risk to existing behavior.
  • Dedicated search nodes will minimize any skewing or imbalance.

Cons:

  • For nodes with both remote search shards and local shards, independent balancers can cause shard skewing and resource utilization imbalance given that the two balancers operate in isolation.

Future Work:

  • Introduce a new hybrid balancer which can accommodate for both the shard types when making balancing and assignment decisions while also optimizing the shard iteration process between the balancers.

Other Approaches

Update the existing weight function and balancer

  • The current weight function and balancers are designed to addresses only for local shards present on the nodes within the cluster.
  • We can look at updating the existing balancer and weight functions to also accommodate for the new remote search shards by changing mathematical equation used.
  • Pros:
    • Remote search shards and local shards will be handled by the same balancer.
  • Cons:
    • Difficult to isolate the workings of the balancer for the experimental release.
    • Backward compatibility issues as the complex logic might break with unintended consequences.

Backward Compatibility

The proposed solution described for proposed solution will maintain backwards compatibility across previous major versions even with the feature flag turned on.

@anasalkouz
Copy link
Member

@kotwanikunal Can we close this one? since the design already finalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
Status: Done
Development

No branches or pull requests

2 participants