[ML] Stop datafeeds running when their jobs are stale #37227

droberts195 · 2019-01-08T14:58:58Z

We already had logic to stop datafeeds running against
jobs that were OPENING, but a job that relocates from
one node to another while OPENED stays OPENED, and this
could cause the datafeed to fail when it sent data to
the OPENED job on its new node before it had a
corresponding autodetect process.

This change extends the check to stop datafeeds running
when their job is OPENING or stale (i.e. has not had
its status reset since relocating to a different node).

Relates #36810

We already had logic to stop datafeeds running against jobs that were OPENING, but a job that relocates from one node to another while OPENED stays OPENED, and this could cause the datafeed to fail when it sent data to the OPENED job on its new node before it had a corresponding autodetect process. This change extends the check to stop datafeeds running when their job is OPENING _or_ stale (i.e. has not had its status reset since relocating to a different node). Relates elastic#36810

elasticmachine · 2019-01-08T14:59:03Z

Pinging @elastic/ml-core

dimitris-athanasiou

LGTM

davidkyle

LGTM

droberts195 · 2019-01-09T07:19:33Z

run gradle build tests 1

droberts195 · 2019-01-09T09:08:11Z

run gradle build tests 1

We already had logic to stop datafeeds running against jobs that were OPENING, but a job that relocates from one node to another while OPENED stays OPENED, and this could cause the datafeed to fail when it sent data to the OPENED job on its new node before it had a corresponding autodetect process. This change extends the check to stop datafeeds running when their job is OPENING _or_ stale (i.e. has not had its status reset since relocating to a different node). Relates #36810

This reverts commit d7efadc The test should now work following the change made in #37227 Closes #36810

This is a reinforcement of elastic#37227. It turns out that persistent tasks are not made stale if the node they were running on is restarted and the master node does not notice this. The main scenario where this happens is when minimum master nodes is the same as the number of nodes in the cluster, so the cluster cannot elect a master node when any node is restarted. When an ML node restarts we need the datafeeds for any jobs that were running on that node to not just wait until the jobs are allocated, but to wait for the autodetect process of the job to start up. In the case of reassignment of the job persistent task this was dealt with by the stale status test. But in the case where a node restarts but its persistent tasks are not reassigned we need a deeper test. Fixes elastic#36810

This is a reinforcement of #37227. It turns out that persistent tasks are not made stale if the node they were running on is restarted and the master node does not notice this. The main scenario where this happens is when minimum master nodes is the same as the number of nodes in the cluster, so the cluster cannot elect a master node when any node is restarted. When an ML node restarts we need the datafeeds for any jobs that were running on that node to not just wait until the jobs are allocated, but to wait for the autodetect process of the job to start up. In the case of reassignment of the job persistent task this was dealt with by the stale status test. But in the case where a node restarts but its persistent tasks are not reassigned we need a deeper test. Fixes #36810

This reverts commit d7efadc. The test should now work following the change made in elastic/elasticsearch#37227

droberts195 added >bug v7.0.0 :ml Machine learning v6.6.0 v6.7.0 labels Jan 8, 2019

droberts195 requested a review from davidkyle January 8, 2019 14:58

Improve comments

2abfe0a

dimitris-athanasiou approved these changes Jan 8, 2019

View reviewed changes

droberts195 mentioned this pull request Jan 8, 2019

[CI] Failures on 6.x for 60_ml_config_migration/Test old cluster jobs and datafeeds and delete them #36810

Closed

davidkyle approved these changes Jan 8, 2019

View reviewed changes

Merge branch 'master' into stop_datafeed_running_against_stale_job

97e81b8

droberts195 merged commit e0ce737 into elastic:master Jan 9, 2019

droberts195 deleted the stop_datafeed_running_against_stale_job branch January 9, 2019 10:42

droberts195 mentioned this pull request Jan 9, 2019

[TEST] Unmute 60_ml_config_migration rolling upgrade #37258

Merged

droberts195 added a commit that referenced this pull request Jan 9, 2019

[TEST] Unmute 60_ml_config_migration rolling upgrade (#37258)

d817b46

This reverts commit d7efadc The test should now work following the change made in #37227 Closes #36810

droberts195 added a commit that referenced this pull request Jan 9, 2019

[TEST] Unmute 60_ml_config_migration rolling upgrade (#37258)

335eeeb

This reverts commit d7efadc The test should now work following the change made in #37227 Closes #36810

droberts195 mentioned this pull request Jan 11, 2019

[ML] Wait for autodetect to be ready in the datafeed #37349

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

2lambda123 pushed a commit to 2lambda123/elastic-elasticsearch that referenced this pull request May 2, 2024

[TEST] Unmute 60_ml_config_migration rolling upgrade

1e6565f

This reverts commit d7efadc. The test should now work following the change made in elastic/elasticsearch#37227

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Stop datafeeds running when their jobs are stale #37227

[ML] Stop datafeeds running when their jobs are stale #37227

droberts195 commented Jan 8, 2019

elasticmachine commented Jan 8, 2019

dimitris-athanasiou left a comment

davidkyle left a comment

droberts195 commented Jan 9, 2019

droberts195 commented Jan 9, 2019

[ML] Stop datafeeds running when their jobs are stale #37227

[ML] Stop datafeeds running when their jobs are stale #37227

Conversation

droberts195 commented Jan 8, 2019

elasticmachine commented Jan 8, 2019

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

droberts195 commented Jan 9, 2019

droberts195 commented Jan 9, 2019