-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] cancel tasks when 3rd retry failed #147190
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
juliaElastic
added
release_note:skip
Skip the PR/issue when compiling release notes
ci:cloud-deploy
Create or update a Cloud deployment
v8.7.0
v8.6.1
labels
Dec 7, 2022
nchaulet
approved these changes
Dec 7, 2022
botelastic
bot
added
the
Team:Fleet
Team label for Observability Data Collection Fleet team
label
Dec 7, 2022
Pinging @elastic/fleet (Team:Fleet) |
nchaulet
reviewed
Dec 7, 2022
kpollich
changed the title
cancel tasks when 3rd retry failed
[Fleet] cancel tasks when 3rd retry failed
Dec 7, 2022
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Dec 8, 2022
## Summary Related to elastic#144161 Found that on a bulk update tags task failure, the task didn't stop after 3 retries (should be over in less then a minute), the retries kept happening for 2 hours. This change removes the retry task if 3 retries are reached. Also testing in cloud deployment to see if the tags error can be reproduced with this fix. I could reproduce the reported error locally, and seeing it goes away with this fix. To verify: - Add at least 50k agents with the `create_agents` script in kibana repo - open Kibana, select the 50k agents, and open Actions / Add tags - Try this in a few seconds: add 2 new tags, and remove one of them - Wait about 30s, the agents should reflect the changes - Check the logs to see that the tasks are removed after 3rd retry is reached or successful. - Check that there are no more running tasks. Any running task can be found in Kibana Console by running this query: `GET .kibana_task_manager/_search?q=task.taskType:"fleet:update_agent_tags:retry"` Locally simulated an error to test that the retry (and check) task is removed: ``` [2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry elastic#3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task [2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task [2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b [2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b ``` (cherry picked from commit 431c32b)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
kibanamachine
added a commit
that referenced
this pull request
Dec 8, 2022
# Backport This will backport the following commits from `main` to `8.6`: - [[Fleet] cancel tasks when 3rd retry failed (#147190)](#147190) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Julia Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-12-08T08:14:33Z","message":"[Fleet] cancel tasks when 3rd retry failed (#147190)\n\n## Summary\r\n\r\nRelated to https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a bulk update tags task failure, the task didn't stop\r\nafter 3 retries (should be over in less then a minute), the retries kept\r\nhappening for 2 hours.\r\nThis change removes the retry task if 3 retries are reached.\r\n\r\nAlso testing in cloud deployment to see if the tags error can be\r\nreproduced with this fix.\r\nI could reproduce the reported error locally, and seeing it goes away\r\nwith this fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the `create_agents` script in kibana repo\r\n- open Kibana, select the 50k agents, and open Actions / Add tags\r\n- Try this in a few seconds: add 2 new tags, and remove one of them\r\n- Wait about 30s, the agents should reflect the changes\r\n- Check the logs to see that the tasks are removed after 3rd retry is\r\nreached or successful.\r\n- Check that there are no more running tasks. Any running task can be\r\nfound in Kibana Console by running this query: `GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally simulated an error to test that the retry (and check) task is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","v8.7.0","v8.6.1"],"number":147190,"url":"https://github.com/elastic/kibana/pull/147190","mergeCommit":{"message":"[Fleet] cancel tasks when 3rd retry failed (#147190)\n\n## Summary\r\n\r\nRelated to https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a bulk update tags task failure, the task didn't stop\r\nafter 3 retries (should be over in less then a minute), the retries kept\r\nhappening for 2 hours.\r\nThis change removes the retry task if 3 retries are reached.\r\n\r\nAlso testing in cloud deployment to see if the tags error can be\r\nreproduced with this fix.\r\nI could reproduce the reported error locally, and seeing it goes away\r\nwith this fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the `create_agents` script in kibana repo\r\n- open Kibana, select the 50k agents, and open Actions / Add tags\r\n- Try this in a few seconds: add 2 new tags, and remove one of them\r\n- Wait about 30s, the agents should reflect the changes\r\n- Check the logs to see that the tasks are removed after 3rd retry is\r\nreached or successful.\r\n- Check that there are no more running tasks. Any running task can be\r\nfound in Kibana Console by running this query: `GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally simulated an error to test that the retry (and check) task is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/147190","number":147190,"mergeCommit":{"message":"[Fleet] cancel tasks when 3rd retry failed (#147190)\n\n## Summary\r\n\r\nRelated to https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a bulk update tags task failure, the task didn't stop\r\nafter 3 retries (should be over in less then a minute), the retries kept\r\nhappening for 2 hours.\r\nThis change removes the retry task if 3 retries are reached.\r\n\r\nAlso testing in cloud deployment to see if the tags error can be\r\nreproduced with this fix.\r\nI could reproduce the reported error locally, and seeing it goes away\r\nwith this fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the `create_agents` script in kibana repo\r\n- open Kibana, select the 50k agents, and open Actions / Add tags\r\n- Try this in a few seconds: add 2 new tags, and remove one of them\r\n- Wait about 30s, the agents should reflect the changes\r\n- Check the logs to see that the tasks are removed after 3rd retry is\r\nreached or successful.\r\n- Check that there are no more running tasks. Any running task can be\r\nfound in Kibana Console by running this query: `GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally simulated an error to test that the retry (and check) task is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce"}},{"branch":"8.6","label":"v8.6.1","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
release_note:skip
Skip the PR/issue when compiling release notes
Team:Fleet
Team label for Observability Data Collection Fleet team
v8.6.0
v8.6.1
v8.7.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Related to #144161
Found that on a bulk update tags task failure, the task didn't stop after 3 retries (should be over in less then a minute), the retries kept happening for 2 hours.
This change removes the retry task if 3 retries are reached.
Also testing in cloud deployment to see if the tags error can be reproduced with this fix.
I could reproduce the reported error locally, and seeing it goes away with this fix.
To verify:
create_agents
script in kibana repoGET .kibana_task_manager/_search?q=task.taskType:"fleet:update_agent_tags:retry"
Locally simulated an error to test that the retry (and check) task is removed: