Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Watchdog code refactor & update #4168

Merged
merged 24 commits into from
Feb 7, 2020
Merged

Watchdog code refactor & update #4168

merged 24 commits into from
Feb 7, 2020

Conversation

Binyang2014
Copy link
Contributor

@Binyang2014 Binyang2014 commented Feb 5, 2020

  1. Refactor watchdog with golang
  2. Add garbage collector, will auto delete invalid priority classes and secrets created by rest-server

@Binyang2014 Binyang2014 force-pushed the binyli/watchdog branch 6 times, most recently from 98cea00 to be7a296 Compare February 5, 2020 06:16
@Binyang2014 Binyang2014 changed the title Watchdog update Watchdog code refactor & update Feb 5, 2020
@Binyang2014 Binyang2014 requested a review from xudifsd February 5, 2020 14:23
@Binyang2014 Binyang2014 requested a review from abuccts February 5, 2020 14:23
@Binyang2014 Binyang2014 marked this pull request as ready for review February 5, 2020 14:24
@abuccts abuccts requested a review from yqwang-ms February 5, 2020 14:32
@yqwang-ms
Copy link
Member

Could we split it to 2 components, one is only for exporter (i.e. metrics collector), another is only for take repair actions (i.e. the watchdog)

@yqwang-ms
Copy link
Member

And do we really need to do such big changes, to add the garbage collector feature?

@Binyang2014
Copy link
Contributor Author

Could we split it to 2 components, one is only for exporter (i.e. metrics collector), another is only for take repair actions (i.e. the watchdog)

Split it to 2 components need more changes. Such as deployment change and other components (such as alert rules) rely on watchdog metrics may need to change also.

And for metrics collector and gc, they all need to query API server and has some common logic. I think put then into same project is acceptable if neither of them is too heavy.

@Binyang2014
Copy link
Contributor Author

And do we really need to do such big changes, to add the garbage collector feature?

Actually, the change not too big. I don't change the basic logic of the previous code, just do some refactor work. (Previous watchdog put all code into one file and make it not easy to maintain... ). Most changes are go dependencies....

@yqwang-ms
Copy link
Member

Seems you have changed the whole watchdog codes. Where is previous watchdog code? Will you delete them? Will you also run them?


In reply to: 582705262 [](ancestors = 582705262)

@Binyang2014
Copy link
Contributor Author

Seems you have changed the whole watchdog codes. Where is previous watchdog code? Will you delete them? Will you also run them?

In reply to: 582705262 [](ancestors = 582705262)

Delete them in this change, and it will not run anymore. Here is the previous code: https://github.com/microsoft/pai/blob/master/src/watchdog/src/watchdog.py

flag.Set("v", "2")

flag.IntVar(&collectionInterval, "collection-interval", 180, "Interval between two collections (seconds)")
flag.IntVar(&gcInterval, "gc-interval", 60, "Interval between two garbage collections (minutes)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to collect (call APIServer) every 3mins? Is it too frequent?

Copy link
Contributor Author

@Binyang2014 Binyang2014 Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it will collect every 3mins, This changed by this PR: #3761, before that, we collect metrics every 30 seconds...
Currently, it seems works fine. We can change the collection interval if we encounter some performance issue.

Copy link
Member

@yqwang-ms yqwang-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Binyang2014 Binyang2014 merged commit 8abcaf7 into master Feb 7, 2020
@Binyang2014 Binyang2014 deleted the binyli/watchdog branch February 7, 2020 10:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants