Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintain running historical task in hourly cron #132

Conversation

ylwu-amzn
Copy link
Collaborator

Signed-off-by: Yaliang Wu ylwu@amazon.com

Description

Maintain running historical tasks in hour cron.

  1. Clean child tasks and AD result of deleted detector level tasks.
  2. Check if running historical analysis tasks still running. If not running, reset task as stopped; otherwise, check if there are any stale running entities and clean up.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yaliang Wu <ylwu@amazon.com>
@ylwu-amzn ylwu-amzn force-pushed the single-flow-development-rc1-cron-pr branch from 07fea43 to 1f64c66 Compare July 19, 2021 20:28
src/main/java/org/opensearch/ad/task/ADTaskManager.java Outdated Show resolved Hide resolved
// Length of detector id is 20. Here we create a random string as request id to get hash with
// HashRing, then we can control some maintaining task to just run on one data node. Read
// ADTaskManager#maintainRunningHistoricalTasks for more details.
CronRequest modelDeleteRequest = new CronRequest(RandomStringUtils.random(20), dataNodes);
Copy link
Collaborator

@kaituo kaituo Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to do it on a data node randomly? CronRequest is a broadcast call and not free. Can we just do it on master node? Your implementation will save master node some work, but will bring one more broadcast call to the cluster. The real work is a query, a broadcast call to get profiles and forward stale running entities. Is the real work expensive? Also, you have the backward compatibility to deal with since you modified the CronRequest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why broadcast will be worse than run the work on master node? The master node should do master management work. Put more responsibility on the master node is risky. Master node down will trigger master re-election, if master not reelected like the single master node cluster, that may bring the whole cluster down. The broadcast failure has no other side effect, just fail current hourly cron.

As last comment, will handle backward compatibility in separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to draw a line between do heavy work on master and do light work on master. As a whole, your current solution adds one more round of broadcast call for sth not heavy.

Copy link
Collaborator Author

@ylwu-amzn ylwu-amzn Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite get you, why "current solution adds one more round of broadcast call" ? This broadcast exists in the initial AD release 2 year ago https://github.com/opensearch-project/anomaly-detection/blame/main/src/main/java/org/opensearch/ad/cluster/HourlyCron.java#L58

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I only thought about your change, not the work I did 2 years ago. Sorry about that.

@ylwu-amzn ylwu-amzn force-pushed the single-flow-development-rc1-cron-pr branch from ffbdc7e to 45fd782 Compare July 21, 2021 19:47
Copy link

@zhanghg08 zhanghg08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ylwu-amzn ylwu-amzn merged commit 970ceb5 into opensearch-project:single-flow-development-rc1 Jul 21, 2021
@ylwu-amzn ylwu-amzn mentioned this pull request Jul 27, 2021
1 task
ylwu-amzn added a commit that referenced this pull request Jul 27, 2021
* tune data model for unified flow

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support unified flow for single entity detector

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add realtime task

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support preview with detector directly

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support historical analysis for HC detector (#104)

* support run historical analysis for HC detector

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more javadoc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments: rename method, remove comments etc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more comments for task action

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments from old CR: mainly renaming and add more comments

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* support HC task profiles in profile API;error handling (#123)

* support HC task profiles in profile API;error handling

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add init progress of realtime task (#130)

* add init progress of realtime task

* add realtime task cache

* remove DetectionStateHandler, will track error in realtime task

* add more comments

* add delete anomaly results API (#131)

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain running historical task in hourly cron (#132)

* maintain running historical task in hourly cron

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain realtime tasks

* address comments

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>
ohltyler pushed a commit to ohltyler/anomaly-detection-2 that referenced this pull request Sep 1, 2021
* tune data model for unified flow

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support unified flow for single entity detector

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add realtime task

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support preview with detector directly

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support historical analysis for HC detector (opensearch-project#104)

* support run historical analysis for HC detector

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more javadoc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments: rename method, remove comments etc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more comments for task action

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments from old CR: mainly renaming and add more comments

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* support HC task profiles in profile API;error handling (opensearch-project#123)

* support HC task profiles in profile API;error handling

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add init progress of realtime task (opensearch-project#130)

* add init progress of realtime task

* add realtime task cache

* remove DetectionStateHandler, will track error in realtime task

* add more comments

* add delete anomaly results API (opensearch-project#131)

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain running historical task in hourly cron (opensearch-project#132)

* maintain running historical task in hourly cron

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain realtime tasks

* address comments

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>
ohltyler pushed a commit to ohltyler/anomaly-detection-2 that referenced this pull request Sep 1, 2021
* tune data model for unified flow

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support unified flow for single entity detector

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add realtime task

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support preview with detector directly

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support historical analysis for HC detector (opensearch-project#104)

* support run historical analysis for HC detector

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more javadoc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments: rename method, remove comments etc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more comments for task action

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments from old CR: mainly renaming and add more comments

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* support HC task profiles in profile API;error handling (opensearch-project#123)

* support HC task profiles in profile API;error handling

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add init progress of realtime task (opensearch-project#130)

* add init progress of realtime task

* add realtime task cache

* remove DetectionStateHandler, will track error in realtime task

* add more comments

* add delete anomaly results API (opensearch-project#131)

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain running historical task in hourly cron (opensearch-project#132)

* maintain running historical task in hourly cron

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain realtime tasks

* address comments

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>
ohltyler pushed a commit that referenced this pull request Sep 1, 2021
* tune data model for unified flow

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support unified flow for single entity detector

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add realtime task

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support preview with detector directly

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* support historical analysis for HC detector (#104)

* support run historical analysis for HC detector

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more javadoc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments: rename method, remove comments etc

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* add more comments for task action

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* address comments from old CR: mainly renaming and add more comments

Signed-off-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>

* support HC task profiles in profile API;error handling (#123)

* support HC task profiles in profile API;error handling

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add init progress of realtime task (#130)

* add init progress of realtime task

* add realtime task cache

* remove DetectionStateHandler, will track error in realtime task

* add more comments

* add delete anomaly results API (#131)

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain running historical task in hourly cron (#132)

* maintain running historical task in hourly cron

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* maintain realtime tasks

* address comments

Co-authored-by: Yaliang Wu <ylwu@dev-dsk-ylwu-2c-e500d1cf.us-west-2.amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants