Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with sliced scroll using routing #27550

Closed
alissonsales opened this issue Nov 27, 2017 · 5 comments · Fixed by #29533
Closed

Issue with sliced scroll using routing #27550

alissonsales opened this issue Nov 27, 2017 · 5 comments · Fixed by #29533
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@alissonsales
Copy link

alissonsales commented Nov 27, 2017

Hi all, first a quick disclaimer, I'm not entirely sure if the following is a bug or documentation issue. After reading the sliced scroll section from the scroll API docs I got the impression that sliced scroll is supposed to work when targeting a single shard.

Elasticsearch version (bin/elasticsearch --version): 5.5.2, but I've tested with elasticsearch 6 and reproduced the same behaviour.

Plugins installed:

curl localhost:9200/_cat/plugins
o2qKP9T ingest-geoip      5.5.2
o2qKP9T ingest-user-agent 5.5.2
o2qKP9T x-pack            5.5.2

JVM version (java -version):

$ java -version
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)

OS version (uname -a if on a Unix-like system): I'm using elasticsearch official docker image.
docker.elastic.co/elasticsearch/elasticsearch:5.5.2

Description of the problem including expected versus actual behavior:

I'm trying to perform a sliced scroll targeting only one shard through routing and elasticsearch is returning all the results in only one of the 2 slices.

I expect elasticsearch to slice the query/results across all slices, even when targeting one shard only.

Steps to reproduce:

I have created a small bash script to reproduce the problem, please find it here.

Here are my results when I run the script using 1 and 2 shards.

Using 1 shard

$ bash sliced_scroll.sh 1
ES version
{
  "name" : "o2qKP9T",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "bhLcjiBTTBaWlq6OuVC-Mg",
  "version" : {
    "number" : "5.5.2",
    "build_hash" : "b2f0c09",
    "build_date" : "2017-08-14T12:33:14.154Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}
Create index
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 48

{"acknowledged":true,"shards_acknowledged":true}Adding docs...
slice id 0 search
4
slice id 1 search
5

Elasticsearch returns 2 slices, splitting the query/results as expected.

Using 2 shards

$ bash sliced_scroll.sh 2
ES version
{
  "name" : "o2qKP9T",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "bhLcjiBTTBaWlq6OuVC-Mg",
  "version" : {
    "number" : "5.5.2",
    "build_hash" : "b2f0c09",
    "build_date" : "2017-08-14T12:33:14.154Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}
Create index
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 48

{"acknowledged":true,"shards_acknowledged":true}Adding docs...
slice id 0 search
9
slice id 1 search
0

Elasticsearch returns 2 slices, but doesn't split the query/results, returning all results in only one slice.

I hope this covers all details required to reproduce the issue and I apologise in case this is the expected behaviour and I'm missing something.

Regards,
Alisson Sales

@s1monw
Copy link
Contributor

s1monw commented Nov 27, 2017

I think it’s expected as of today from how it’s implemented but we should really try to fix it to also work if there is more than one shard and routing is used, I agree it looks like a bug! Thanks for opening this issue

@jimczi
Copy link
Contributor

jimczi commented Nov 28, 2017

Yes this is expected because only the total number of shards per index is used to perform the slicing.
We could take the routing into account but we have multiple ways to filter/route searches based on the sharding. For the simple routing case where a single shard is selected per index this is simple since we just need to pass this information to the shard request (the slices are resolved in the shard directly) but it is more complicated to handle routing index partition: (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html#routing-index-partition)
and _shards preferences (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html) since we don't pass this information to shard requests and we would need more than a boolean that indicates if a single shard is requested or not.
I need to think more about this but I agree with @s1monw that we should try to fix it, I'd just add that if we fix it it should work for all types of routing.

@jimczi jimczi self-assigned this Nov 28, 2017
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Scroll labels Feb 14, 2018
@talevy talevy added :Search/Search Search-related issues that do not fall into other categories and removed :Search/Search Search-related issues that do not fall into other categories labels Mar 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@talevy talevy added the >bug label Mar 26, 2018
@bevans88
Copy link

I've recently come across the same problem (at least I believe it is https://discuss.elastic.co/t/empty-slices-with-scan-scroll/127255). Is this issue being actively looked at, or is it seen as low priority?

@jimczi
Copy link
Contributor

jimczi commented Apr 10, 2018

It's on my todo list, not high priority but I'll try to find some time in the coming days to work on a fix.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Apr 16, 2018
This commit adds two new methods to ShardSearchRequest:
 * #numberOfShardsIndex() that returns the number of shards of this index
   that participates in the request.
 * #remapShardId() that returns the remapped shard id of this shard for this request.
   The remapped shard id is the id of the requested shard among all shards
   of this index that are part of the request. Note that the remapped shard id
   is equal to the original shard id if all shards of this index are part of the request.

These informations are useful when the _search is executed with a preference (or a routing) that
restricts the number of shards requested for an index.
This change allows to fix a bug in sliced scrolls executed with a preference (or a routing).
Instead of computing the slice query from the total number of shards in the index, this change allows to
compute this number from the number of shards per index that participates in the request.

Fixes elastic#27550
jimczi added a commit that referenced this issue Apr 26, 2018
This commit propagates the preference and routing of the original SearchRequest in the ShardSearchRequest.
This information is then use to fix a bug in sliced scrolls when executed with a preference (or a routing).
Instead of computing the slice query from the total number of shards in the index, this commit computes this number from the number of shards per index that participates in the request.

Fixes #27550
jimczi added a commit that referenced this issue Apr 26, 2018
This commit propagates the preference and routing of the original SearchRequest in the ShardSearchRequest.
This information is then use to fix a bug in sliced scrolls when executed with a preference (or a routing).
Instead of computing the slice query from the total number of shards in the index, this commit computes this number from the number of shards per index that participates in the request.

Fixes #27550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants