Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCS: sibling pipeline aggregations don't work when minimizing roundtrips #40059

Closed
chrisronline opened this issue Mar 14, 2019 · 6 comments · Fixed by #40177
Closed

CCS: sibling pipeline aggregations don't work when minimizing roundtrips #40059

chrisronline opened this issue Mar 14, 2019 · 6 comments · Fixed by #40177
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@chrisronline
Copy link
Contributor

chrisronline commented Mar 14, 2019

Elasticsearch version (bin/elasticsearch --version): latest master snapshot (hash: 8839a72)

Plugins installed: []

JVM version (java -version): java version "11.0.1" 2018-10-16 LTS

OS version (uname -a if on a Unix-like system): Darwin Kernel Version 18.2.0

Description of the problem including expected versus actual behavior:

When performing a CCS search request, an error occurs if there is a pipeline aggregation in the query.

The exact error is:

   │      java.lang.UnsupportedOperationException: Not supported
   │      	at org.elasticsearch.search.aggregations.pipeline.InternalSimpleValue.doReduce(InternalSimpleValue.java:80) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.search.aggregations.pipeline.InternalSimpleValue.doReduce(InternalSimpleValue.java:34) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:135) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:90) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.action.search.SearchResponseMerger.getMergedResponse(SearchResponseMerger.java:193) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.action.search.TransportSearchAction$3.createFinalResponse(TransportSearchAction.java:389) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.action.search.TransportSearchAction$3.createFinalResponse(TransportSearchAction.java:379) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]

Steps to reproduce:

  1. Start two ES instances, ensuring they have unique cluster.names
  2. Bulk index documents into both:
POST _bulk
{ "index": { "_index": "sales" } }
{ "price": 10, "payment_type": "credit_card" }
{ "index": { "_index": "sales" } }
{ "price": 20, "payment_type": "cash" }
  1. Configure one ES instance to talk to the other through CCS
POST _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "other": {
          "seeds": [
            "{other_es_instance_address}"
          ]
        }
      }
    }
  }
}
  1. Perform a CCS search that uses a pipeline aggregations from the same ES instance
POST *:sales,sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_type": {
      "terms": {
        "field": "payment_type.keyword"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    },
    "max_per_type": {
      "max_bucket": {
        "buckets_path": "sales_per_type>sales"
      }
    }
  }
}

It's worth noting that the same query does not incur an error when used in either of these contexts:
POST sales/_search
POST *:sales/_search

The error only occurs when doing them together: POST *:sales,sales/_search.

It's also worth noting that the query does not incur an error when you remove the pipeline aggregation:

POST *:sales,sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_type": {
      "terms": {
        "field": "payment_type.keyword"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

This is currently BREAKING the Kibana Stack Monitoring app, as we run these sorts of pipeline aggregations which are currently broken with CCS enabled.

The work done in this PR might be related to this, as the description seems like it touches the code around this

@cachedout
Copy link
Contributor

cachedout commented Mar 14, 2019

@javanna and @jimczi This is a pretty serious bug for @elastic/stack-monitoring and is potentially impacting customers (edit: we found this in master but aren't sure which versions it affects). We'd love some help tracking this down ASAP if you have a little bit of time. Thanks.

@jimczi jimczi added >bug :Search/Search Search-related issues that do not fall into other categories labels Mar 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@cachedout
Copy link
Contributor

Potentially related: #40067

@javanna
Copy link
Member

javanna commented Mar 14, 2019

This manifests when trying to minimize roundtrips (see #32125). I have tested pipeline aggregations and they worked fine in my tests, but this one is clearly a bug that was missed which needs fixing. I can see how a bunch of pipeline aggs are affected by this though. Until this gets fixed, please provide the minimize_roundtrips parameter set to false to the search request.

@javanna javanna self-assigned this Mar 14, 2019
@chrisronline
Copy link
Contributor Author

@javanna I think the parameter is named ccs_minimize_roundtrips

@javanna
Copy link
Member

javanna commented Mar 14, 2019

yes it is thanks ;)

@javanna javanna changed the title Pipeline aggregations not working with CCS CCS: sibling pipeline aggregations don't work when minimizing roundtrips Mar 14, 2019
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 15, 2019
Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to elastic#40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit that referenced this issue Mar 18, 2019
…40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 18, 2019
We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit that referenced this issue Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 19, 2019
…lastic#40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to elastic#40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 19, 2019
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 19, 2019
…lastic#40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to elastic#40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 19, 2019
…lastic#40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to elastic#40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 19, 2019
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 19, 2019
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit that referenced this issue Mar 19, 2019
…40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit that referenced this issue Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
javanna added a commit that referenced this issue Mar 19, 2019
…40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit that referenced this issue Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
javanna added a commit that referenced this issue Mar 19, 2019
…40101)

Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
javanna added a commit that referenced this issue Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants