Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: async search responses have CCS metadata while searches are running #117265

Merged

Conversation

quux00
Copy link
Contributor

@quux00 quux00 commented Nov 21, 2024

ES|QL async search responses now include CCS metadata while the query is still running.
The CCS metadata will be present only if a remote cluster is queried and the user requested
it with the include_ccs_metadata: true setting on the original request to POST /_query/async.
The setting cannot be modified in the query to GET /_query/async/:id.

The core change is that the EsqlExecutionInfo object is set on the EsqlQueryTask, which is used
for async ES|QL queries, so that calls to GET /_query/async/:id have access to the same
EsqlExecutionInfo object that is being updated as the planning and query progress.

Example response showing in progress state of the query:

GET _query/async/FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA
{
  "id": "FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA==",
  "is_running": true,
  "columns": [],
  "values": [],
  "_clusters": {
    "total": 3,
    "successful": 1,
    "running": 2,
    "skipped": 0,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "running",
        "indices": "web_traffic",
        "_shards": {
          "total": 2,
          "skipped": 0
        }
      },
      "remote1": {
        "status": "running",
        "indices": "web_traffic"
      },
      "remote2": {
        "status": "successful",
        "indices": "web_traffic",
        "took": 180,
        "_shards": {
          "total": 2,
          "successful": 2,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}

@quux00 quux00 added >enhancement auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 labels Nov 21, 2024
@quux00 quux00 force-pushed the esql/async-responses-have-progress-metadata branch 3 times, most recently from 065f5a5 to 47756d7 Compare November 25, 2024 17:45
…sportEsqlQueryAction, so metadata is now shown in async response while search is running
…ter query, not after can-match.

Coordinator only queries set shard counts before the compute phase in order to allow asserts to pass
in ComputeListener that need to be enforced for data node based queries.
…usAndShardCounts, since skipped==successful

Only setting took time in XContent when status != RUNNING, to avoid showing intermediate took times that
get set as each node finishes.

Adjusted tests to match new behavior - now passing:
• EsqlQueryResponseTests.java
• ComputeListenerTests.java
@quux00 quux00 force-pushed the esql/async-responses-have-progress-metadata branch from 47756d7 to d2151b1 Compare November 25, 2024 21:31
@quux00
Copy link
Contributor Author

quux00 commented Nov 26, 2024

buildkite test this

@quux00 quux00 added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Nov 26, 2024
@quux00 quux00 marked this pull request as ready for review November 26, 2024 15:46
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @quux00, I've created a changelog YAML for you.

}
}
delegate.onResponse(result);
}, e -> delegate.onFailure(failureCollector.getFailure())));
}

private static void setFinalStatusAndShardCounts(String clusterAlias, EsqlExecutionInfo executionInfo) {
executionInfo.swapCluster(clusterAlias, (k, v) -> {
if (v.getStatus() != EsqlExecutionInfo.Cluster.Status.SKIPPED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about PARTIAL status here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. Currently in the the code, there is no place where PARTIAL is ever set. Yours will be the first, so I put in a TODO to handle that once we start adding support for partial. I don't have any context or reason to add it right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll need to remember this and if this patch merges first (likely) to update mine for this case.

@@ -112,6 +112,7 @@ private ComputeListener(
if (runningOnRemoteCluster()) {
// for remote executions - this ComputeResponse is created on the remote cluster/node and will be serialized and
// received by the acquireCompute method callback on the coordinating cluster
setFinalStatusAndShardCounts(clusterAlias, executionInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 105 states that clusterAlias and executionInfo should both be null or non-null. If I assume that they both are null, won't setFinalStatusAndShardCounts() be dereferencing null? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two if-guards of runningOnRemoteCluster() and coordinatingClusterIsSearchedInCCS() ensure that they are not null. clusterAlias and executionInfo are both null when handling data node operations rather than coordinator level operations and CCS metadata is not tracked/updated during data node operations, only coordinator operations.

…arch is running ("took so far").

Addressed a few other minor aspects based on PR feedback.
Copy link
Contributor

@pawankartik-elastic pawankartik-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more comments from my side; I've the required context.

@quux00 quux00 merged commit e33e1a0 into elastic:main Nov 27, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

quux00 added a commit to quux00/elasticsearch that referenced this pull request Nov 27, 2024
…ning (elastic#117265)

ES|QL async search responses now include CCS metadata while the query is still running.
The CCS metadata will be present only if a remote cluster is queried and the user requested
it with the `include_ccs_metadata: true` setting on the original request to `POST /_query/async`.
The setting cannot be modified in the query to `GET /_query/async/:id`.

The core change is that the EsqlExecutionInfo object is set on the EsqlQueryTask, which is used
for async ES|QL queries, so that calls to `GET /_query/async/:id` have access to the same
EsqlExecutionInfo object that is being updated as the planning and query progress.

Secondly, the overall `took` time is now always present on ES|QL responses, even for
async-searches while the query is still running. The took time shows a "took-so-far" value
and will change upon refresh until the query has finished. This is present regardless of
the `include_ccs_metadata` setting.

Example response showing in progress state of the query:

```
GET _query/async/FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA
```

```json
{
  "id": "FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA==",
  "is_running": true,
  "took": 2032,
  "columns": [],
  "values": [],
  "_clusters": {
    "total": 3,
    "successful": 1,
    "running": 2,
    "skipped": 0,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "running",
        "indices": "web_traffic",
        "_shards": {
          "total": 2,
          "skipped": 0
        }
      },
      "remote1": {
        "status": "running",
        "indices": "web_traffic"
      },
      "remote2": {
        "status": "successful",
        "indices": "web_traffic",
        "took": 180,
        "_shards": {
          "total": 2,
          "successful": 2,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}
```
quux00 added a commit to quux00/elasticsearch that referenced this pull request Nov 28, 2024
…ning (elastic#117265)

ES|QL async search responses now include CCS metadata while the query is still running.
The CCS metadata will be present only if a remote cluster is queried and the user requested
it with the `include_ccs_metadata: true` setting on the original request to `POST /_query/async`.
The setting cannot be modified in the query to `GET /_query/async/:id`.

The core change is that the EsqlExecutionInfo object is set on the EsqlQueryTask, which is used
for async ES|QL queries, so that calls to `GET /_query/async/:id` have access to the same
EsqlExecutionInfo object that is being updated as the planning and query progress.

Secondly, the overall `took` time is now always present on ES|QL responses, even for
async-searches while the query is still running. The took time shows a "took-so-far" value
and will change upon refresh until the query has finished. This is present regardless of
the `include_ccs_metadata` setting.

Example response showing in progress state of the query:

```
GET _query/async/FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA
```

```json
{
  "id": "FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA==",
  "is_running": true,
  "took": 2032,
  "columns": [],
  "values": [],
  "_clusters": {
    "total": 3,
    "successful": 1,
    "running": 2,
    "skipped": 0,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "running",
        "indices": "web_traffic",
        "_shards": {
          "total": 2,
          "skipped": 0
        }
      },
      "remote1": {
        "status": "running",
        "indices": "web_traffic"
      },
      "remote2": {
        "status": "successful",
        "indices": "web_traffic",
        "took": 180,
        "_shards": {
          "total": 2,
          "successful": 2,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}
```
elasticsearchmachine pushed a commit that referenced this pull request Nov 28, 2024
…ning (#117265) (#117664)

ES|QL async search responses now include CCS metadata while the query is still running.
The CCS metadata will be present only if a remote cluster is queried and the user requested
it with the `include_ccs_metadata: true` setting on the original request to `POST /_query/async`.
The setting cannot be modified in the query to `GET /_query/async/:id`.

The core change is that the EsqlExecutionInfo object is set on the EsqlQueryTask, which is used
for async ES|QL queries, so that calls to `GET /_query/async/:id` have access to the same
EsqlExecutionInfo object that is being updated as the planning and query progress.

Secondly, the overall `took` time is now always present on ES|QL responses, even for
async-searches while the query is still running. The took time shows a "took-so-far" value
and will change upon refresh until the query has finished. This is present regardless of
the `include_ccs_metadata` setting.

Example response showing in progress state of the query:

```
GET _query/async/FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA
```

```json
{
  "id": "FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA==",
  "is_running": true,
  "took": 2032,
  "columns": [],
  "values": [],
  "_clusters": {
    "total": 3,
    "successful": 1,
    "running": 2,
    "skipped": 0,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "running",
        "indices": "web_traffic",
        "_shards": {
          "total": 2,
          "skipped": 0
        }
      },
      "remote1": {
        "status": "running",
        "indices": "web_traffic"
      },
      "remote2": {
        "status": "successful",
        "indices": "web_traffic",
        "took": 180,
        "_shards": {
          "total": 2,
          "successful": 2,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}
```
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…ning (elastic#117265)

ES|QL async search responses now include CCS metadata while the query is still running.
The CCS metadata will be present only if a remote cluster is queried and the user requested
it with the `include_ccs_metadata: true` setting on the original request to `POST /_query/async`.
The setting cannot be modified in the query to `GET /_query/async/:id`.

The core change is that the EsqlExecutionInfo object is set on the EsqlQueryTask, which is used
for async ES|QL queries, so that calls to `GET /_query/async/:id` have access to the same
EsqlExecutionInfo object that is being updated as the planning and query progress.

Secondly, the overall `took` time is now always present on ES|QL responses, even for
async-searches while the query is still running. The took time shows a "took-so-far" value
and will change upon refresh until the query has finished. This is present regardless of
the `include_ccs_metadata` setting.

Example response showing in progress state of the query:

```
GET _query/async/FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA
```

```json
{
  "id": "FlhaeTBxUU0yU2xhVzM2TlRLY3F1eXcceWlSWWZlRDhUVTJEUGFfZUROaDdtUTo0MDQwNA==",
  "is_running": true,
  "took": 2032,
  "columns": [],
  "values": [],
  "_clusters": {
    "total": 3,
    "successful": 1,
    "running": 2,
    "skipped": 0,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "running",
        "indices": "web_traffic",
        "_shards": {
          "total": 2,
          "skipped": 0
        }
      },
      "remote1": {
        "status": "running",
        "indices": "web_traffic"
      },
      "remote2": {
        "status": "successful",
        "indices": "web_traffic",
        "took": 180,
        "_shards": {
          "total": 2,
          "successful": 2,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants