Execute Enrich policy task with wait_for_completion=false does not retain task status after completion #70554

askids · 2021-03-18T12:40:36Z

Elasticsearch version (bin/elasticsearch --version): 7.8.1
OpenJDK 64-Bit Server VM warning: Ignoring option UseConcMarkSweepGC; support was removed in 14.0
OpenJDK 64-Bit Server VM warning: Ignoring option CMSInitiatingOccupancyFraction; support was removed in 14.0
OpenJDK 64-Bit Server VM warning: Ignoring option UseCMSInitiatingOccupancyOnly; support was removed in 14.0
Version: 7.8.1, Build: unknown/unknown/b5ca9c58fb664ca8bf9e4057fc229b3396bf3a89/2020-07-21T16:40:44.668009Z, JVM: 14.0.1

Plugins installed: [readonlyrest - 1.28.0]

JVM version (java -version):
openjdk 14.0.1 2020-04-14
openJDK Runtime Environment AdoptOpenJDK <build 14.0.1+7>
openJDK 64-Bit Server VM AdoptOpenJDK <build 14.0.1+7, mixed mode, sharing>

OS version (uname -a if on a Unix-like system): Windows 2012 R2

Description of the problem including expected versus actual behavior:
When we execute an enrich policy with parameter wait_for_completion=false, we get the task id back. But we are not able to consistently query the status of the task via GET _tasks/ end point. When we try to get status immediately, it will show completed as false and show the status, but subsequent attempts to get the task status results in different kind of errors depending on how long after was the GET task status was executed.

Expected behavior is that GET _tasks should provide the proper status even after the task is completed. Without getting the task status completion, we wont be able to implement any reliable polling process to verify that the enrichment policy execution was successfully completed. We have a requirement to update the enrichment index on a daily basis to get updated data from source index. So we need to be able to get the task status reliably after executing the policy.

Steps to reproduce:

Create enrich policy
execute enrich policy with parameter wait_for_completion=false
Perform GET _tasks/

Provide logs (if relevant):

POST /_enrich/policy/my_enrich_policy_name/_execute?wait_for_completion=false

GET _tasks/oFKHJq8iSi69dXLxKh7EMA:4907254

{
  "completed" : false,
  "task" : {
    "node" : "oFKHJq8iSi69dXLxKh7EMA",
    "id" : 4907254,
    "type" : "enrich",
    "action" : "policy_execution",
    "status" : {
      "phase" : "RUNNING"
    },
    "description" : "my_enrich_policy_name",
    "start_time_in_millis" : 1616054637089,
    "running_time_in_nanos" : 7782110814,
    "cancellable" : false,
    "parent_task_id" : "oFKHJq8iSi69dXLxKh7EMA:4907195",
    "headers" : { }
  }
}

{
  "error" : {
    "root_cause" : [
      {
        "type" : "transport_serialization_exception",
        "reason" : "Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]"
      }
    ],
    "type" : "transport_serialization_exception",
    "reason" : "Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Unknown NamedWriteable [org.elasticsearch.tasks.Task$Status][enrich-policy-execution]"
    }
  },
  "status" : 500
}


{
  "error" : {
    "root_cause" : [
      {
        "type" : "resource_not_found_exception",
        "reason" : "task [oFKHJq8iSi69dXLxKh7EMA:4907254] isn't running and hasn't stored its results"
      }
    ],
    "type" : "resource_not_found_exception",
    "reason" : "task [oFKHJq8iSi69dXLxKh7EMA:4907254] isn't running and hasn't stored its results"
  },
  "status" : 404
}

Thanks!

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-23T10:17:40Z

Pinging @elastic/es-core-features (Team:Core/Features)

martijnvg · 2021-03-24T09:22:20Z

There are two parts to this request:

There is a serialization error. The ExecuteEnrichPolicyTask isn't registered correctly. (this needs to be fixed)
The node task that performs the policy execution needs to be stored in the .tasks index, otherwise the status can't be known after a node task has been executed.

Referencing #51628, since that will redefine how the task APIs should be used. In light of that, perhaps we should have a dedicated api to query the status of async policy executions (instead of the above second bullet point).

When executing the enrich execute policy api and not waiting for completion, then querying for task via task list api can result into a serialization error. Relates to elastic#70554

martijnvg · 2021-05-04T10:59:19Z

There is a serialization error. The ExecuteEnrichPolicyTask isn't registered correctly. (this needs to be fixed)

Actually this is already fixed via #62364 and the fix is available from version 7.10. So upgrading should fix that
serialisation error in your case.

martijnvg · 2021-05-04T15:05:24Z

Instead of checking the tasks api when executing a policy in the background for the status, I think it is easier to use the enrich stats api: GET /_enrich/_stats. This includes details about policies that are currently executing.

Which returns something like this:

{
    "executing_policies": [
        {
            "name": "my-policy",
            "task": {
                "node": "mYT-5C6tRTm9_v6q5GF22w",
                "id": 5190,
                "type": "enrich",
                "action": "policy_execution",
                "status": {
                    "phase": "RUNNING"
                },
                "description": "my-policy",
                "start_time_in_millis": 1620140016776,
                "running_time_in_nanos": 1266045350,
                "cancellable": false,
                "parent_task_id": "mYT-5C6tRTm9_v6q5GF22w:5189",
                "headers": {}
            }
        }
    ],
    "coordinator_stats": [
       ...
    ]
}

This is also more useful, since it returns the task information in a per policy basis (by name), so it easier to lookup and there is no need to record the task id that the execute policy api returns.

@askids If this api would also return the task information from past executions (the last execution for each policy) then would this allow you the consistently fetch the status of a policy execution?

askids · 2021-05-04T17:36:22Z

@askids If this api would also return the task information from past executions (the last execution for each policy) then would this allow you the consistently fetch the status of a policy execution?

Yes @martijnvg , that can work, if it shows the last execution of each policy, along with the status. But currently, if there is no executing policies, it wont show anything. So we wouldn't be able to tell, if that execution was successful or was it empty due to it being cancelled/failed etc.

askids · 2021-05-04T17:37:32Z

There is a serialization error. The ExecuteEnrichPolicyTask isn't registered correctly. (this needs to be fixed)

Actually this is already fixed via #62364 and the fix is available from version 7.10. So upgrading should fix that
serialisation error in your case.

We are scheduled to upgrade to 7.10.2 (from 7.8.1) in another 3 weeks. May be, i can verify it then on the newer version.

martijnvg · 2021-05-10T07:03:02Z

But currently, if there is no executing policies, it wont show anything. So we wouldn't be able to tell, if that execution was successful or was it empty due to it being cancelled/failed etc.

Yes, this is something that I think can be improved in the current enrich stats api.

We are scheduled to upgrade to 7.10.2 (from 7.8.1) in another 3 weeks. May be, i can verify it then on the newer version.

That would be great!

askids · 2021-05-24T13:04:00Z

hi @martijnvg

We completed upgrade to 7.10.2. I checked for serialization issue with GET _tasks api. I no longer get that error. When I continue to run GET _tasks, it directly now moves from RUNNING status to resource_not_found_exception, after task is completed. So atleast one part of the reported issue seems to be fixed. That leaves us with the main issue of trying to find task status of a completed enrich task using task id.

Thanks!

martijnvg · 2021-05-25T06:45:50Z

@askids Thanks for letting us know!

That leaves us with the main issue of trying to find task status of a completed enrich task using task id.

I've opened #73353 to track this feature request.

askids · 2021-07-18T05:07:13Z

@martijnvg we upgraded to 7.10.2. Now I am starting to see the same issue on reindex activityalso. When I run reindex with wait_for_completion=false and use the returned task id to get status using GET _tasks/, on many occassions (even when task is still running), I get same error as originally reported " isn't running and hasn't stored its results". Should I submit a separate issue for it?

martijnvg · 2021-07-20T12:03:32Z

@askids I think the get task api should be used in order to retrieve the information about the reindex task. The get task api should check the tasks index in case the task has completed execution. You should use the task id returned from the reindex api as argument to the get task api.

askids · 2021-07-20T13:33:54Z

Yes. That is what we were always doing. But after recent upgrade to 7.10.2, when we run reindex task with wait_for_completion=false, the task id returned is not queryable using GET _tasks api. It works for some id and not for others. If I run multiple reindex tasks from dev tools in one shot, none of the ids returned are queryable. If I run, reindex one script at a time, the id returned is queryable.

Initially, i thought that reindex script was bad. But I could see the doc count increasing on the index as it was a long running process. But I was getting " isn't running and hasn't stored its results" message. So either the reindex API returned wrong task id or GET _tasks is not able to pull up the status due to other issue.

martijnvg · 2021-07-22T08:26:30Z

If the get task api doesn't return a task for a completed async reindex execution then I think that is a bug. As far as I see that should work (whereas for execute policy api this is currently not implemented). Opening a separate issue for this makes sense.

askids added >bug needs:triage Requires assignment of a team area label labels Mar 18, 2021

matriv added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Mar 23, 2021

elasticmachine added the Team:Data Management Meta label for data/management team label Mar 23, 2021

gwbrown removed the needs:triage Requires assignment of a team area label label Mar 26, 2021

martijnvg self-assigned this May 4, 2021

martijnvg mentioned this issue May 4, 2021

Ensure that ExecuteEnrichPolicyStatus is properly registered. #72675

Closed

martijnvg mentioned this issue May 25, 2021

Track past executions of the execute policy api. #73353

Open

martijnvg closed this as completed May 25, 2021

davehouser1 mentioned this issue Aug 2, 2024

EnrichClient times out when using execute_policy elastic/elasticsearch-py#2622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute Enrich policy task with wait_for_completion=false does not retain task status after completion #70554

Execute Enrich policy task with wait_for_completion=false does not retain task status after completion #70554

askids commented Mar 18, 2021 •

edited

Loading

elasticmachine commented Mar 23, 2021

martijnvg commented Mar 24, 2021

martijnvg commented May 4, 2021

martijnvg commented May 4, 2021

askids commented May 4, 2021

askids commented May 4, 2021 •

edited

Loading

martijnvg commented May 10, 2021

askids commented May 24, 2021

martijnvg commented May 25, 2021

askids commented Jul 18, 2021

martijnvg commented Jul 20, 2021

askids commented Jul 20, 2021

martijnvg commented Jul 22, 2021

Execute Enrich policy task with wait_for_completion=false does not retain task status after completion #70554

Execute Enrich policy task with wait_for_completion=false does not retain task status after completion #70554

Comments

askids commented Mar 18, 2021 • edited Loading

elasticmachine commented Mar 23, 2021

martijnvg commented Mar 24, 2021

martijnvg commented May 4, 2021

martijnvg commented May 4, 2021

askids commented May 4, 2021

askids commented May 4, 2021 • edited Loading

martijnvg commented May 10, 2021

askids commented May 24, 2021

martijnvg commented May 25, 2021

askids commented Jul 18, 2021

martijnvg commented Jul 20, 2021

askids commented Jul 20, 2021

martijnvg commented Jul 22, 2021

askids commented Mar 18, 2021 •

edited

Loading

askids commented May 4, 2021 •

edited

Loading