-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute Enrich policy task with wait_for_completion=false does not retain task status after completion #70554
Comments
Pinging @elastic/es-core-features (Team:Core/Features) |
There are two parts to this request:
Referencing #51628, since that will redefine how the task APIs should be used. In light of that, perhaps we should have a dedicated api to query the status of async policy executions (instead of the above second bullet point). |
When executing the enrich execute policy api and not waiting for completion, then querying for task via task list api can result into a serialization error. Relates to elastic#70554
Actually this is already fixed via #62364 and the fix is available from version 7.10. So upgrading should fix that |
Instead of checking the tasks api when executing a policy in the background for the status, I think it is easier to use the enrich stats api: Which returns something like this:
This is also more useful, since it returns the task information in a per policy basis (by name), so it easier to lookup and there is no need to record the task id that the execute policy api returns. @askids If this api would also return the task information from past executions (the last execution for each policy) then would this allow you the consistently fetch the status of a policy execution? |
Yes @martijnvg , that can work, if it shows the last execution of each policy, along with the status. But currently, if there is no executing policies, it wont show anything. So we wouldn't be able to tell, if that execution was successful or was it empty due to it being cancelled/failed etc. |
We are scheduled to upgrade to 7.10.2 (from 7.8.1) in another 3 weeks. May be, i can verify it then on the newer version. |
Yes, this is something that I think can be improved in the current enrich stats api.
That would be great! |
hi @martijnvg We completed upgrade to 7.10.2. I checked for serialization issue with GET _tasks api. I no longer get that error. When I continue to run GET _tasks, it directly now moves from RUNNING status to resource_not_found_exception, after task is completed. So atleast one part of the reported issue seems to be fixed. That leaves us with the main issue of trying to find task status of a completed enrich task using task id. Thanks! |
@martijnvg we upgraded to 7.10.2. Now I am starting to see the same issue on reindex activityalso. When I run reindex with wait_for_completion=false and use the returned task id to get status using GET _tasks/, on many occassions (even when task is still running), I get same error as originally reported " isn't running and hasn't stored its results". Should I submit a separate issue for it? |
@askids I think the get task api should be used in order to retrieve the information about the reindex task. The get task api should check the tasks index in case the task has completed execution. You should use the task id returned from the reindex api as argument to the get task api. |
Yes. That is what we were always doing. But after recent upgrade to 7.10.2, when we run reindex task with wait_for_completion=false, the task id returned is not queryable using GET _tasks api. It works for some id and not for others. If I run multiple reindex tasks from dev tools in one shot, none of the ids returned are queryable. If I run, reindex one script at a time, the id returned is queryable. Initially, i thought that reindex script was bad. But I could see the doc count increasing on the index as it was a long running process. But I was getting " isn't running and hasn't stored its results" message. So either the reindex API returned wrong task id or GET _tasks is not able to pull up the status due to other issue. |
If the get task api doesn't return a task for a completed async reindex execution then I think that is a bug. As far as I see that should work (whereas for execute policy api this is currently not implemented). Opening a separate issue for this makes sense. |
Elasticsearch version (
bin/elasticsearch --version
): 7.8.1OpenJDK 64-Bit Server VM warning: Ignoring option UseConcMarkSweepGC; support was removed in 14.0
OpenJDK 64-Bit Server VM warning: Ignoring option CMSInitiatingOccupancyFraction; support was removed in 14.0
OpenJDK 64-Bit Server VM warning: Ignoring option UseCMSInitiatingOccupancyOnly; support was removed in 14.0
Version: 7.8.1, Build: unknown/unknown/b5ca9c58fb664ca8bf9e4057fc229b3396bf3a89/2020-07-21T16:40:44.668009Z, JVM: 14.0.1
Plugins installed: [readonlyrest - 1.28.0]
JVM version (
java -version
):openjdk 14.0.1 2020-04-14
openJDK Runtime Environment AdoptOpenJDK <build 14.0.1+7>
openJDK 64-Bit Server VM AdoptOpenJDK <build 14.0.1+7, mixed mode, sharing>
OS version (
uname -a
if on a Unix-like system): Windows 2012 R2Description of the problem including expected versus actual behavior:
When we execute an enrich policy with parameter wait_for_completion=false, we get the task id back. But we are not able to consistently query the status of the task via GET _tasks/ end point. When we try to get status immediately, it will show completed as false and show the status, but subsequent attempts to get the task status results in different kind of errors depending on how long after was the GET task status was executed.
Expected behavior is that GET _tasks should provide the proper status even after the task is completed. Without getting the task status completion, we wont be able to implement any reliable polling process to verify that the enrichment policy execution was successfully completed. We have a requirement to update the enrichment index on a daily basis to get updated data from source index. So we need to be able to get the task status reliably after executing the policy.
Steps to reproduce:
Provide logs (if relevant):
Thanks!
The text was updated successfully, but these errors were encountered: