-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce memory watchdog; LOTUS_MAX_HEAP #5101
Conversation
Ok, discovery about badger:
There could be a sweet value for IndexCacheSize, but it’s going to vary wildly from store to store, based on the amount of data the store holds, the compaction state, etc. I think there’s no other option than to bite the bullet for now and keep the indices + blooms in memory. With the hot/cold store, we can revisit this conversation, because it certainly will make zero sense to hold all the cold tables in memory. That is, if we haven't moved away from badger already. |
I will back out the change to set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one question
// stop isn't nice. | ||
opts.CompactL0OnClose = false | ||
// read-only and efficiently queried. | ||
opts.CompactL0OnClose = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be problematic/lead to data loss if user triggers shutdown, and after a few seconds seeing that the process isn't stopping, kills the process during compaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how Lotus used to behave before the native badger blockstore:
Lines 34 to 35 in d4cdc6d
opts.Options = dgbadger.DefaultOptions("").WithTruncate(true). | |
WithValueThreshold(1 << 10) |
The default value of this flag is true
.
Flipping it to false
was actually a regression, and it was caught by the Sentinel team because their lens component opens the store in read-only mode. Since L0 (unsorted level) hadn't been compacted to L1 (sorted level), badger refused to open it.
If we didn't notice corruption issues before, we shouldn't notice them now. However, I won't put my hand in the fire for badger...
I'm thinking that we might want to introduce a feature flag (env variable) to disable the memory watchdog entirely in case we find that some OS reports wrong system information. |
^^ Done in f9cbf6d. |
Documented in filecoin-project/filecoin-docs#609. |
So if I don't set this variable, watch dog still works to prevent going over 95% of memory? |
@kernelogic yes, it should automatically discover your total system memory and use it as a limit. It will then trigger GC at the watermarks listed above, and it will go into emergency mode when memory utilisation goes above 95%, at which point it will trigger GC every time it is called. You should see the effective limits in log statements like this:
There is probably some tuning to do still, so it would be awesome if you can take this for a spin and report back. It is recommended that you run Lotus with this env variable. You will see much better working set / RSS reporting with this (which will become the default with go1.16):
|
I am setting this variable to 28GiB on my 32GiB server to see how it goes. They have been either showing "Killed" for no reason or just hung after sometime. |
@kernelogic if those were OOM Kills (likely), this patch should help. Please report back! |
OK It's been running for 24 hours without being killed. Any chance this can be looked at? |
@raulk lotus crashed again, can't tell easily what went wrong except a "Killed" as the last line of the log.
|
@raulk I am seeing a lot of emergency GC from watchdog - a few times per minute. But memory usage still high. |
@kernelogic Thanks for the info! Could you please take a heap dump during emergency state, and upload it here?
|
My daemon crashed today on a 64GB RAM machine. You can see in the logs emergency triggers 5 times within 30 seconds and eventually crashed. Therefore I cannot get the heap. lotus version 1.2.2+git.a999e4167
|
Another advice, I think the first water mark 0.5 is a bit...low. The minimum RAM for miners is set to 128GB and when doing wdpost it will use like 80GB of RAM. This GC will just keep running every 45 seconds because it always exceed 0.5. |
Subsumes #4930.
This PR introduces major improvements to memory management:
0.50, 0.60, 0.70, 0.85, 0.90, 0.925, 0.95
; this means that the watchdog will trigger GC every time that utilisation grows above one of those marks.0.95
; this means that if utilisation is above 95%, the watchdog will always force GC even if within the silence period.45s
; this means that the watchdog will refrain from forcing GC if a GC run finished less than 45s ago.LOTUS_MAX_HEAP
env variable.12345678
) or SI bytes (32GiB
).it sets a size for the badger index cache. This value was previously unset, which badger interpreted as "keep all indices in memory", rather than "do not keep anything in memory" (which is what was reasonable to expect when caches are unset...) See description in badger options: enable IndexCache and CompactL0OnClose. #4930 for more information.the value is set at 20% of the maximum heap, or total system memory, clamped at a minimum bound of 256MiB, and a maximum bound of 1GiB.CompactL0OnClose = true
, to enable read-only open by processes like Sentinel.Relates to #4445, #4877, #1876, #1895, #4487.
Fixes #5058.
Epic: #4753.