-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[config change] use MessageCommitMode when executing future head block messages #2705
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, requesting changes because I miss a test for this fix
I created a draft PR that exposes Stylus long term cache metrics that can be helpful when implementing tests in this PR. I didn't implement tests in my PR since it requires that long term caching is working properly, which is not true in the master branch 😬 In case you want to use what I developed you can get the changes from my branch into your branch, and then continue and implement the tests. |
…stylus-lt-cache-test
|
||
// See if the item is in the long term cache | ||
if let Some(item) = cache.long_term.get(&key) { | ||
return Some(item.data()); | ||
} | ||
|
||
// See if the item is in the LRU cache, promoting if so | ||
if let Some(item) = cache.lru.get(&key) { | ||
let data = item.data(); | ||
if let Some(item) = cache.lru.peek(&key).cloned() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This codepath clones data twice: once here in the "get" and the other when returning item.data().
Cloning the entry_size_estimate_bytes is o.k., but we don't want to clone module and engine unnecessarily.
This is where rust gets you :)
There are probably some solutions that would avoid cloning result of the peek, but I think simplest would probably be if you can avoid cloning in item.data() because item itself is discarded right after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tsahi pointed out that there's one more unnecessary clone, so that's not fixed yet, working on it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passed item.module and item.engine without cloning to the returned Option, let me know if that checks out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
Co-authored-by: Diego Ximenes Mendes <dxmendes1@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good. Still need to review program_test.go
will create an issue to fix the remaining problem in caching
arbitrator/stylus/src/cache.rs
Outdated
} | ||
cache.long_term_counters.misses += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should only be increased if long_term_tag is 1. Because that would mean "this should be in long term cache".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, it will filter out API calls noise and we should be able to observe how many misses are there when a node starts up. Changed :)
pub extern "C" fn stylus_get_lru_cache_metrics() -> LruCacheMetrics { | ||
InitCache::get_lru_metrics() | ||
pub extern "C" fn stylus_get_cache_metrics() -> CacheMetrics { | ||
InitCache::get_metrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegoximenes
this is a bug in the previous PR as well.
You're allocating memory here in rust and returning the pointer to go.
Go discards it because it's a garbage collected language and the memory is never released.
Solution is is to allocate CacheMetrics in go, pass a pointer and let rust update the data in the struct. That way rust doesn't allocate anything new and no memory is lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a separate issue.
Fixes NIT-2812
Pulls: OffchainLabs/go-ethereum#362
Includes: #2712
This PR:
Fixes use of MessageRunMode values, so as MessageCommitMode is used when, and only when, the message is part of a soon-to-be head block. Previously, newly sequenced / synced messages were executed in MessageReplayMode - newly activated / set-cached stylus programs were not cached in long term cache (only in LRU).
Improves repopulating of long term cache after node restart - if program is onchain marked as cached, if its wasm is found in LRU then it is also added to long term cache. That can happen e.g. when a ephemeral call to cached program precedes its onchain execution.
Adds tests for stylus long term cache + for repopulating long term cache from LRU cache.
Adds metrics for Stylus long term cache (merged from Diego's draft: Stylus cache improvements #2712)
Adds config to disable collection of Stylus metrics from Go side (also from the Diego's draft)