Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report errors on full disk #484

Closed
willscott opened this issue May 13, 2022 · 2 comments
Closed

Report errors on full disk #484

willscott opened this issue May 13, 2022 · 2 comments

Comments

@willscott
Copy link
Member

Ken labs reports:
the STI process hangs and no related errors are reported when disk space is used up

@gammazero gammazero self-assigned this Jun 2, 2022
@gammazero
Copy link
Collaborator

gammazero commented Jan 12, 2023

With indexer freeze functionality, each indexer logs its disk usage at every check:

INFO    Disk usage OK   {"usage": "0.13%", "freezeAt": "90%"}

When it is within 10% of reaching the freeze threshold, a warning is logged:

WARN    Disk usage ALERT     {"usage": "81.77%", "freezeAt": "90%"}

When it is within 2% of reaching the freeze threshold, a critical warning is logged:

WARN    Disk usage CRITICAL  {"usage": "88.45%", "freezeAt": "90%"}

The usage checks and log messages become more frequent as usage approaches the freeze point. When the disk reaches the freeze threshold, the indexer enters frozen mode and stops storing new index data. With a reasonable freeze threshold, the disk should not become full.

If using an assigner service, the indexer's publishers will be reassigned to other indexers. Otherwise, the indexer can be unfrozen when more capacity is added and it will resume indexing from where it left off.

Please close this issue if this is sufficient.

@gammazero gammazero removed their assignment Jan 12, 2023
gammazero added a commit that referenced this issue Jan 12, 2023
Maintain a disk usage metric, percent usage, for the file system located at the location of the value store.

This addresses #119 and #484
gammazero added a commit that referenced this issue Jan 13, 2023
Maintain a disk usage metric, percent usage, for the file system located at the location of the value store.

This addresses #119 and #484
gammazero added a commit that referenced this issue Jan 13, 2023
* Add disk usage metric for value store

Maintain a disk usage metric, percent usage, for the file system located at the location of the value store.

This addresses #119 and #484

* test speedup

* Remove useless metrics update signal

Ingestion metrics are updated periodically if an update has been signaled. If the indexer has any activity, update is signaled. So in almost all cases the update is signaled. If the indexer has no activity, there is no point in avoiding the update because the indexer has nothing better to do and then the metrics should still be updated as they may be affected by other external factors like fs usage or storage compacting.

So, remove the update signaling since it does not help anything, and only causes unnecessary context switches. Simply update the metrics periodically.
@gammazero
Copy link
Collaborator

These log entries should serve as a sufficient warning for full disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants