Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshots] Add a new node role for remote search capabilities #4652

Closed
Tracked by #2919
kotwanikunal opened this issue Sep 30, 2022 · 9 comments · Fixed by #4689
Closed
Tracked by #2919

[Searchable Snapshots] Add a new node role for remote search capabilities #4652

kotwanikunal opened this issue Sep 30, 2022 · 9 comments · Fixed by #4689
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search v2.4.0 'Issues and PRs related to version v2.4.0'

Comments

@kotwanikunal
Copy link
Member

kotwanikunal commented Sep 30, 2022

Is your feature request related to a problem? Please describe.

  • Searchable snapshots will utilize a remote shard mechanism in which the entire shard will not be downloaded onto the node
  • To suit this use case, we would like to have a set of nodes which can provide this capability. It will be enabled by adding support for a new REMOTE_SEARCHER node role

Describe the solution you'd like

  • Currently, OpenSearch supports a variety of roles like cluster_manager, data, ingest listed here.
  • For searchable snapshots as well as future phases of the storage roadmap, we would like to add a new role which can provide querying capabilities for a remote shard.
  • This role should be configured as a part of the configuration file for the node.
  • A reference for the same can be found here: Add support for remote snapshot alliocation kotwanikunal/OpenSearch#4

Additional context

@kotwanikunal kotwanikunal added enhancement Enhancement or improvement to existing feature or request Indexing & Search labels Sep 30, 2022
@dblock
Copy link
Member

dblock commented Sep 30, 2022

Is this an important static capability, or is it an optimization that can be modeled with dynamic node role that doesn't always exist and is simply preferred? Meaning if no node was "remote searcher", could you fall back to a "data" node? If that's the case then you can use #3436 with no code changes.

@kotwanikunal
Copy link
Member Author

kotwanikunal commented Sep 30, 2022

Is this an important static capability, or is it an optimization that can be modeled with dynamic node role that doesn't always exist and is simply preferred? Meaning if no node was "remote searcher", could you fall back to a "data" node? If that's the case then you can use #3436 with no code changes.

This is a required capability where we will need additional node configuration for a node to perform as a remote searcher. We might have to design specifically for a fall back scenario, and it will not work as per the current design.
@andrross

@andrross
Copy link
Member

andrross commented Oct 3, 2022

My intuition is that we want this as an important static capability. I believe it's possible to design the system to work that a regular data node can fallback to acting as a remote searcher, but it is generally a sub-optimal setup. The static role would require a user to be intentional about such a setup by applying both roles to a given node.

Having typed all of that, I suspect that is true about any of the dynamic roles that are used to select "preferred" nodes, so I'm open to being convinced otherwise.

@dblock
Copy link
Member

dblock commented Oct 5, 2022

@kotwanikunal What are things that would go into the additional node configuration?

@andrross Maybe my question implied fallback too much. We can also use the dynamic node capability without fallback (fail as fallback).

Reading the issue the tl;dr difference between a remote search node and a regular data node in which the entire shard will not be downloaded onto the node, correct? Is there more to this node? Does it need to be a first class node type?

Finally, is there a better name than "remote searcher"? Is "search" a better name for this that can in the future collect other search-only capabilities?

@andrross
Copy link
Member

andrross commented Oct 5, 2022

@dblock

Reading the issue the tl;dr difference between a remote search node and a regular data node in which the entire shard will not be downloaded onto the node, correct? Is there more to this node? Does it need to be a first class node type?

The intent is for the role to serve two purposes:

  1. Requires that part of the local disk is reserved for caching the remote index data (this is the additional configuration)
  2. Ensures that "remote" shards are only allocated to these nodes

@dblock
Copy link
Member

dblock commented Oct 6, 2022

@andrross Thanks, makes sense.

I do want to try to explain why I am suggesting not introducing a "remote search" role: It's a kind of "search node". I suspect that in the grand scheme of things users want to separate and independently scale indexing from searching. It would be real simple to think of these as "index" and "search" roles, and OpenSearch making decisions such as "a node is both index and search therefore shards are downloaded to the node" vs. "a node is just search and therefore only remote shards are allocated to the node and part of disk is reserved for caching". Am I over-simplifying this? When we've implemented all known storage and search ideas that are already discussed out there, what will this picture look like, and will a "remote search" node still make sense?

@andrross
Copy link
Member

andrross commented Oct 6, 2022

@dblock

I do expect there to be separate "index" and "search" nodes as has been discussed. It's a fair question whether the "remote" aspect of it should be baked into the role. The purpose of the "remote" part is to ensure the requisite cache configuration has been supplied, but we could either define reasonable defaults or fail at runtime if the required configuration is not present. I do like the simplicity of "index" and "search".

(Note that there may well be a remote variant of indexers as well, i.e. indexers that index to local disk and replicate via seg rep versus indexers that write directly to a remote store and searchers access the remote data. If we go with the "remote search" role then we're potentially looking at 4 distinct roles.)

@dblock
Copy link
Member

dblock commented Oct 7, 2022

I would be much more comfortable with a "search" role vs. "remote search" as a node optimized for search, and other parameters such as whether "MB of disk is reserved for cache" becomes a configuration parameter that is independent of the role (but may have some semantics like "cannot be enabled on a node that's not "search").

@tlfeng
Copy link
Collaborator

tlfeng commented Oct 11, 2022

A new node role search which is dedicated for search operation is added.

In main branch (version 3.0): PR #4689 / commit c1272c1.
In 2.x branch (version 2.4.0): PR #4739 / commit 747aa97.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants