-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple alert system; FD limit alerts #7108
Conversation
36e62c7
to
b7bad70
Compare
@magik6k This feature would be more useful if there was a way to have the alert information extracted from the machine that the daemon/miner is running on. Most mining operations don't perform monitoring via sshing into a box and running lotus[-miner] commands but rather scrape logs or metrics. If the alerting journal was logged to a file like the standard journal that would be super useful. |
This definitely can be done when we have more alert types. Do you mind opening an issue (maybe proposing some specific mechanism, I'm not too familiar with alerting systems)?
Alerts are logged to the journal |
Sure thing.
I tried looking for them there, but I couldn't find them, granted the only alert that I was getting was the 'low fd limit' alert but I saw no mention of it in the journal. I will reconfirm that today though. |
Codecov Report
@@ Coverage Diff @@
## master #7108 +/- ##
==========================================
- Coverage 34.79% 34.78% -0.02%
==========================================
Files 685 688 +3
Lines 80207 80386 +179
==========================================
+ Hits 27906 27959 +53
- Misses 46609 46749 +140
+ Partials 5692 5678 -14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits you can ignore if you just want to merge it.
func CheckFdLimit(min uint64) func(al *alerting.Alerting) { | ||
return func(al *alerting.Alerting) { | ||
if ulimit.GetLimit == nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No alert in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we just disable the alert (this can only happen if you manage to run lotus natively on Windows, which I guess deserves it's own alert)
Flags: []cli.Flag{ | ||
&cli.BoolFlag{ | ||
Name: "all", | ||
Usage: "get all (active and inactive) alerts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that it defaults to active?
5ef86d4
to
e96dd9c
Compare
This PR adds a new 'Alerting' system, which is built on top of the existing Journal. The basic idea is to have a central place for reporting things which need user action, but aren't necessarily so critical that they would warrant stopping the miner process.
It allows subsystems to define alerts which can be raised or resolved. Alerts can then be viewed by users with
lotus[-miner] log alerts.
Also adds a basic 'too low file descriptor limit' alert (this has caused me to fail windowpost multiple times (and happens when starting the miner directly from the shell, which I sometimes have to do when trying new things)).
Example
log alerts
output with bad limit, and after fixing the limit, and restarting the node:Example
lotus-miner info
when some alerts are active:TODO: