Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple alert system; FD limit alerts #7108

Merged
merged 10 commits into from
Aug 26, 2021
Merged

Simple alert system; FD limit alerts #7108

merged 10 commits into from
Aug 26, 2021

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Aug 17, 2021

This PR adds a new 'Alerting' system, which is built on top of the existing Journal. The basic idea is to have a central place for reporting things which need user action, but aren't necessarily so critical that they would warrant stopping the miner process.

It allows subsystems to define alerts which can be raised or resolved. Alerts can then be viewed by users with lotus[-miner] log alerts.

Also adds a basic 'too low file descriptor limit' alert (this has caused me to fail windowpost multiple times (and happens when starting the miner directly from the shell, which I sometimes have to do when trying new things)).

Example log alerts output with bad limit, and after fixing the limit, and restarting the node:
2021-08-17-151249_1164x157_scrot

Example lotus-miner info when some alerts are active:
2021-08-17-175548_633x164_scrot

TODO:

  • Docstrings
  • Open issues with more alert ideas (after approved)
  • Tests

@magik6k magik6k force-pushed the feat/alerting branch 4 times, most recently from 36e62c7 to b7bad70 Compare August 18, 2021 13:26
@magik6k magik6k marked this pull request as ready for review August 18, 2021 13:38
@magik6k magik6k requested a review from a team as a code owner August 18, 2021 13:38
@lanzafame
Copy link
Contributor

@magik6k This feature would be more useful if there was a way to have the alert information extracted from the machine that the daemon/miner is running on. Most mining operations don't perform monitoring via sshing into a box and running lotus[-miner] commands but rather scrape logs or metrics. If the alerting journal was logged to a file like the standard journal that would be super useful.

@magik6k
Copy link
Contributor Author

magik6k commented Aug 20, 2021

This feature would be more useful if there was a way to have the alert information extracted from the machine that the daemon/miner is running on

This definitely can be done when we have more alert types. Do you mind opening an issue (maybe proposing some specific mechanism, I'm not too familiar with alerting systems)?

If the alerting journal was logged to a file like the standard journal that would be super useful.

Alerts are logged to the journal

@lanzafame
Copy link
Contributor

This definitely can be done when we have more alert types. Do you mind opening an issue (maybe proposing some specific mechanism, I'm not too familiar with alerting systems)?

Sure thing.

Alerts are logged to the journal

I tried looking for them there, but I couldn't find them, granted the only alert that I was getting was the 'low fd limit' alert but I saw no mention of it in the journal. I will reconfirm that today though.

@codecov
Copy link

codecov bot commented Aug 23, 2021

Codecov Report

Merging #7108 (1ba427f) into master (15667db) will decrease coverage by 0.01%.
The diff coverage is 51.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7108      +/-   ##
==========================================
- Coverage   34.79%   34.78%   -0.02%     
==========================================
  Files         685      688       +3     
  Lines       80207    80386     +179     
==========================================
+ Hits        27906    27959      +53     
- Misses      46609    46749     +140     
+ Partials     5692     5678      -14     
Impacted Files Coverage Δ
api/api_common.go 100.00% <ø> (ø)
api/mocks/mock_full.go 1.77% <0.00%> (-0.01%) ⬇️
api/proxy_gen.go 10.07% <0.00%> (-0.03%) ⬇️
api/v0api/v0mocks/mock_full.go 0.00% <0.00%> (ø)
cli/log.go 0.00% <0.00%> (ø)
cmd/lotus-miner/init.go 0.00% <0.00%> (ø)
cmd/lotus/daemon.go 0.00% <0.00%> (ø)
journal/types.go 86.66% <ø> (-13.34%) ⬇️
node/modules/alerts.go 55.55% <55.55%> (ø)
journal/alerting/alerts.go 63.76% <63.76%> (ø)
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15667db...1ba427f. Read the comment docs.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits you can ignore if you just want to merge it.

lib/ulimit/ulimit.go Outdated Show resolved Hide resolved
node/builder.go Outdated Show resolved Hide resolved
node/builder_miner.go Outdated Show resolved Hide resolved
func CheckFdLimit(min uint64) func(al *alerting.Alerting) {
return func(al *alerting.Alerting) {
if ulimit.GetLimit == nil {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No alert in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we just disable the alert (this can only happen if you manage to run lotus natively on Windows, which I guess deserves it's own alert)

Flags: []cli.Flag{
&cli.BoolFlag{
Name: "all",
Usage: "get all (active and inactive) alerts",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that it defaults to active?

journal/alerting/alerts.go Show resolved Hide resolved
journal/alerting/alerts.go Show resolved Hide resolved
@magik6k magik6k force-pushed the feat/alerting branch 2 times, most recently from 5ef86d4 to e96dd9c Compare August 26, 2021 14:03
@magik6k magik6k enabled auto-merge August 26, 2021 14:12
@magik6k magik6k merged commit 55dbbf5 into master Aug 26, 2021
@magik6k magik6k deleted the feat/alerting branch August 26, 2021 14:34
@magik6k magik6k mentioned this pull request Aug 27, 2021
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants