-
Notifications
You must be signed in to change notification settings - Fork 549
Conversation
Binyang2014
commented
Feb 5, 2020
•
edited
Loading
edited
- Refactor watchdog with golang
- Add garbage collector, will auto delete invalid priority classes and secrets created by rest-server
98cea00
to
be7a296
Compare
be7a296
to
de43ff3
Compare
Could we split it to 2 components, one is only for exporter (i.e. metrics collector), another is only for take repair actions (i.e. the watchdog) |
And do we really need to do such big changes, to add the garbage collector feature? |
Split it to 2 components need more changes. Such as deployment change and other components (such as alert rules) rely on watchdog metrics may need to change also. And for metrics collector and gc, they all need to query API server and has some common logic. I think put then into same project is acceptable if neither of them is too heavy. |
Actually, the change not too big. I don't change the basic logic of the previous code, just do some refactor work. (Previous watchdog put all code into one file and make it not easy to maintain... ). Most changes are go dependencies.... |
src/watchdog/GOPATH/src/github.com/microsoft/watchdog/Gopkg.toml
Outdated
Show resolved
Hide resolved
Seems you have changed the whole watchdog codes. Where is previous watchdog code? Will you delete them? Will you also run them? In reply to: 582705262 [](ancestors = 582705262) |
Delete them in this change, and it will not run anymore. Here is the previous code: https://github.com/microsoft/pai/blob/master/src/watchdog/src/watchdog.py |
src/watchdog/GOPATH/src/github.com/microsoft/watchdog/Gopkg.toml
Outdated
Show resolved
Hide resolved
src/watchdog/GOPATH/src/github.com/microsoft/watchdog/pkg/watchdog/exporter.go
Show resolved
Hide resolved
b1d3b3a
to
506a817
Compare
flag.Set("v", "2") | ||
|
||
flag.IntVar(&collectionInterval, "collection-interval", 180, "Interval between two collections (seconds)") | ||
flag.IntVar(&gcInterval, "gc-interval", 60, "Interval between two garbage collections (minutes)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to collect (call APIServer) every 3mins? Is it too frequent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default it will collect every 3mins, This changed by this PR: #3761, before that, we collect metrics every 30 seconds...
Currently, it seems works fine. We can change the collection interval if we encounter some performance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1f77cb4
to
0e73b4c
Compare