Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard cluster names for cross cluster search #23893

Closed
clintongormley opened this issue Apr 4, 2017 · 10 comments
Closed

Wildcard cluster names for cross cluster search #23893

clintongormley opened this issue Apr 4, 2017 · 10 comments
Assignees
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@clintongormley
Copy link
Contributor

With cross cluster search today, each cluster has to be specified separately, eg

GET one:*,two:*,three:*/_search

It would be good to be able to support wildcards:

GET *:*/_search

By extension, (because it is how it works everywhere else), we should support pattern matching like:

GET t*:*/_search

One complication is that : is an allowed character in index names today (although we have plans to deprecate #23892). We can work around this with the following logic:

  • if remote clusters are not configured, treat the specified name as an index name
  • if remote clusters ARE configured, treat everything before the first : as a cluster name
@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates :Search/Search Search-related issues that do not fall into other categories >enhancement labels Apr 4, 2017
@spalger
Copy link
Contributor

spalger commented Apr 4, 2017

Do you think there is a reason Kibana needs to know which requests should get the *: cluster selector? If we executed all queries against *:{index}, do you think that would be too costly or error prone?

At some point we want to give users more control in this regard, maybe making it an index pattern level setting, but out of the gate I'm thinking Kibana might just add *: before every index in requests to es...

@Tim-Brooks
Copy link
Contributor

Tim-Brooks commented Apr 7, 2017

I've nearly completed a basic PR for this. However, as I worked on it a number of questions came up for me.

1. Should this support any options like index wildcards do?

https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html

- expand_wildcards
- ignore_unavailable
- allow_no_indices

Those options do not necessarily make sense in context of cross cluster search. I guess I'm just asking if we want any options.

Such as what do we do in the case a wildcard is used but no cluster matches? Which brings to number 2:

2. What do we do if a wildcard is used but no cluster matches?

Do we treat that as a local index? Even though a * cannot be in an index name? Do we throw an error? Currently I believe that if a cluster name does not match we treat it as a local index. So "nonexistentcluster:index" would be treated as index on local node: "nonexistentcluster:index".

3. Should this change support + and - wildcard options?

Right now for indices we support them.

https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html

cluster1:+test*,cluster1:-test3

I guess with cluster names this might look something like:

+cluster*:+test*,-cluster3:+test*,+cluster*:-test3

That obviously can get kind of complicated when you're using both + and - options for both clusters and indices.

4. Do wildcards for cluster names reference the local cluster?

So does "*:test" search the local cluster? Or do you need to do "*:test,test". And if it does search the local cluster do we use "cluster.name" as the local cluster name to match against the wildcard? I think this diverges from the remote clusters where we use the alias defined locally opposed to the remote cluster.name setting?

@clintongormley
Copy link
Contributor Author

  1. Should this support any options like index wildcards do?

No, those options applies to the underlying indices (and any wildcard expansion of those)

  1. What do we do if a wildcard is used but no cluster matches?

Same as today, ignore them

  1. Should this change support + and - wildcard options?

No, let's keep this simple

  1. Do wildcards for cluster names reference the local cluster?

This is tricky... Today, you can define an alias for the local cluster, in which case *:foo would work just fine. I think that we will probably end up defining _local_ or something similar as a default alias in the future. I'd say for now, let the wildcard apply only to defined aliases. If the user doesn't specify a local alias then it won't match local indices.

@Tim-Brooks
Copy link
Contributor

Tim-Brooks commented Apr 11, 2017

This is tricky... Today, you can define an alias for the local cluster, in which case *:foo would work just fine.

@clintongormley I was looking through the documentation today and was not able to figure out exactly what you're referencing. Are you saying that you would do something like this in the config:

search.remote.local_cluster_alias.seeds: "127.0.0.1:9300"

@clintongormley
Copy link
Contributor Author

@tbrooks8 yes you can do that today, but actually @s1monw wants to remove this and to replace it with a _local_ predefined alias, used just for cluster name matching

Tim-Brooks added a commit that referenced this issue Apr 11, 2017
This is related to #23893. This commit allows users to use wilcards for
cluster names when executing a cross cluster search.

So instead of defining every cluster such as:

GET one:*,two:*,three:*/_search

A user could just search:

GET *:*/_search

As ":" characters are currently allowed in index names, if the text
up to the first ":" does not match a defined cluster name, the entire
string is treated as an index name.
Tim-Brooks added a commit that referenced this issue Apr 11, 2017
This is related to #23893. This commit allows users to use wilcards for
cluster names when executing a cross cluster search.

So instead of defining every cluster such as:

GET one:*,two:*,three:*/_search

A user could just search:

GET *:*/_search

As ":" characters are currently allowed in index names, if the text
up to the first ":" does not match a defined cluster name, the entire
string is treated as an index name.
@djschny
Copy link
Contributor

djschny commented Apr 14, 2017

Very happy to see this as this was the very first thing I tried to do when 5.3.0 was released and I was testing out the behavior. Look forward to it coming.

However I'm confused and trying to understand why *:logstash-*/_search is not desirable to match indices on the cluster of the coordinating node of the request? IMO this would be a default behavior majority of folks would want.

Instead it sounds like _local_,*:logstash-*/_search would be needed?

@Tim-Brooks
Copy link
Contributor

@djschny I can give you the development perspective and then maybe @clintongormley can add anything if he wants.

Currently you have to manually define aliases for remote clusters either in the config or with settings requests. Those aliases are are what we use for matching.

The problem is that we do not have a predefined alias for the local cluster. So while we could add behavior to always search the local cluster with *:logstash-*/_search, we would not know whether to search the local cluster if you used a slightly more complicated search such as r*:logstash-*/_search.

The current solution is to add the local cluster to your config for cross cluster search. Once you have a defined alias, everything works as it does for any other cluster.

Currently, if you have not defined the local cluster for cross cluster search and you want to search both then you do logstash-*,*:logstash-*/_search.

On the other hand, I think that the _local_ idea is that in the future we would have a permanent alias for the local cluster. One in which you would not need to configure anything. In that world, *:logstash-*/_search WOULD search both local and remote clusters. And _local_ would be the alias used for more complicated wildcard matches.

@djschny
Copy link
Contributor

djschny commented Apr 14, 2017

Thanks for the clarification @tbrooks8, I misinterpreted and didn't realize that the adding of localhost in the remote clusters list was more a temporary workaround until the longer term _local_ is available.

Thanks!

@javanna
Copy link
Member

javanna commented Apr 18, 2017

@tbrooks8 should this issue be closed?

@clintongormley
Copy link
Contributor Author

Closed by #23985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

6 participants