-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Config Change] Stylus LRU cache with bytes capacity instead of number of entries capacity #2595
Conversation
2961c11
to
b90c32a
Compare
@@ -75,7 +75,7 @@ var DefaultCachingConfig = CachingConfig{ | |||
SnapshotRestoreGasLimit: 300_000_000_000, | |||
MaxNumberOfBlocksToSkipStateSaving: 0, | |||
MaxAmountOfGasToSkipStateSaving: 0, | |||
StylusLRUCache: 256, | |||
StylusLRUCacheCapacity: 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions on the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with 256 for now. If we frequently run out, we can change it in a later PR.
e91655c
to
165d484
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realised I'm sitting on these Pending comments for a while .. sorry
let count = cache.lru.len(); | ||
let metrics = LruCacheMetrics { | ||
// add 1 to each entry to account that we subtracted 1 in the weight calculation | ||
size_bytes: (cache.lru.weight() + count).try_into().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal entry is order of magnitude of 100,000. a few bytes per entry isn't worth fussing over (I'd go with weight = deser_size + 128 and return lru.weight() from here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the 128 bytes to account for overheads that you mentioned :).
I kept taking into account that clru defines that each entry consumes (weight + 1) of the cache capacity though.
This simplifies tests that checks the expected cache size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making tests simpler is a worthy goal.
Won't you get the same result if you don't decrease 1 from each weight and don't add count here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if not - why? if so - separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope :(.
clru defines that each entry will consume weight(entry) + 1
of the cache's capacity.
We define that weight(entry) = size_in_bytes - 1
, in that way each entry will actually consume size_in_bytes
of the cache's capacity.
This +1/-1 doesn't make much of a practical difference in production, but with it the cache's interface is more straightforward to understand, since we define the cache capacity's in number of bytes, and each entry consumes the number of bytes related to it.
A more straightforward interface also simplifies testing.
That said, considering that we are decreasing 1 from each weight, we add 1 one here so the metrics are consistent.
But it wouldn't be so harmful if this metric is off by one for each entry.
But since we can be precise here without compromising much of code simplicity, I don't see why we wouldn't do this :).
@@ -75,7 +75,7 @@ var DefaultCachingConfig = CachingConfig{ | |||
SnapshotRestoreGasLimit: 300_000_000_000, | |||
MaxNumberOfBlocksToSkipStateSaving: 0, | |||
MaxAmountOfGasToSkipStateSaving: 0, | |||
StylusLRUCache: 256, | |||
StylusLRUCacheCapacity: 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with 256 for now. If we frequently run out, we can change it in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…us LRU cache entry
b514e73
to
f144eb1
Compare
let count = cache.lru.len(); | ||
let metrics = LruCacheMetrics { | ||
// add 1 to each entry to account that we subtracted 1 in the weight calculation | ||
size_bytes: (cache.lru.weight() + count).try_into().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making tests simpler is a worthy goal.
Won't you get the same result if you don't decrease 1 from each weight and don't add count here?
pub struct LruCounters { | ||
pub hits: u32, | ||
pub misses: u32, | ||
pub does_not_fit: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a counter for long-term cache size and add/remove form it when we add/remove entries to the hashmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
@@ -963,4 +963,15 @@ func (s *ExecutionEngine) Start(ctx_in context.Context) { | |||
} | |||
} | |||
}) | |||
// periodically update stylus lru cache metrics | |||
s.LaunchThread(func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a boolean config weather or not to collect metrics. On by default, off by default for system-tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what is the rationale of not always enabling those metrics?
Is it some worry about Prometheus performance with the number of metrics increase?
Sometimes I think that too granular configs, like this one, are not so useful, increases code complexity, and unnecessarily creates confusion for users, that will see one more config that could be "optimized" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rational is that the stylus cache is static and in tests I don't want multiple parallel tests measuring the same PR.
I don't plan to ever turn it off in binary, just tests.
let count = cache.lru.len(); | ||
let metrics = LruCacheMetrics { | ||
// add 1 to each entry to account that we subtracted 1 in the weight calculation | ||
size_bytes: (cache.lru.weight() + count).try_into().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if not - why? if so - separate PR)
Resolves NIT-2698
Stylus LRU cache with bytes capacity instead of number of entries capacity.
It infers the size of each entry by computing the number of bytes related to the serialized wasm module.
This PR also removes
-Wall
from cgo CFLAGS.go doesn't handle it well: https://stackoverflow.com/questions/56633817/warning-unused-variable-cgo-a.
With it the following warning was being generated without need: