-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Allow separate max-heap-pages for offchain (non-consensus) execution context. #8885
Conversation
We cache the wasm instances based on the wasm method, runtime version and the heap pages. Meaning we probably need to increase the cache size, which also leads to a higher memory usage. (just some note on what we should not forget) |
might not be needed, in favour of #8893 |
I'm kinda lost and uncertain if this PR is really needed. This only adds the support for setting the offchain-heap-pages onchain, and then it can be used. Deploying this today, would not help at all unless if we set that value. And setting that value itself has also some caveats as if it is set on-chain, we can not replace it again via a client upgrade.. |
I am not sure what are the plan for heap-pages on chain, but if it will live, having the offchain one seems ok to me. |
@@ -626,22 +626,24 @@ pub enum Profile { | |||
} | |||
|
|||
impl Profile { | |||
fn into_execution_strategies(self) -> ExecutionStrategies { | |||
fn into_execution_configs(self) -> ExecutionConfigs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn into_execution_configs(self) -> ExecutionConfigs { | |
fn into_offchain_execution_configs(self) -> ExecutionConfigs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is calling new_offchain
only, so just precising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually shouldn't it be new_offchain only for 'offchain_worker'?
primitives/state-machine/src/lib.rs
Outdated
|
||
/// The context of the execution. | ||
#[derive(Copy, Clone, Eq, PartialEq, Debug)] | ||
pub enum ExecutionContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an existing ExecutionContext enum in sp_core.
So would suggest renaming (but no naming idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I noticed that as well, but can't find another alternative for now.
Co-authored-by: cheme <emericchevalier.pro@gmail.com>
Yeah I'll check on the burnin. Note that now we create ONLY one additional instance with this higher number of heap pages. |
Okay, I've done the burn-in. Two observations:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, I am just not sure if the additional memory usage is fine but otherwise I approve.
Generally I would find it good for offchain worker to be able to configure the max number of instance, the number of runing instance and override the page size from the cli, but it is an other question. Yet it will make it easy to handle memory issue in the future.
Also Offchain context currently only change the heap size (and the pool), omitting the pool, it could also just have been an 'Override' or 'Custom' heap size option that offchain will specifically use with executor, making changes smaller, but I guess it can be good to distinguish offchain case (using different wasm for offchain could be nice, but it is also an other question).
@@ -626,22 +626,24 @@ pub enum Profile { | |||
} | |||
|
|||
impl Profile { | |||
fn into_execution_strategies(self) -> ExecutionStrategies { | |||
fn into_execution_configs(self) -> ExecutionConfigs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually shouldn't it be new_offchain only for 'offchain_worker'?
let exec = &self.execution_strategies; | ||
let exec_all_or = |strat: Option<ExecutionStrategy>, default: ExecutionStrategy| { | ||
let exec_all_or = |start: Option<ExecutionStrategy>, default: ExecutionStrategy| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the renaming? ('strat' was for strategy I guess)
@@ -262,12 +269,13 @@ impl sp_core::traits::ReadRuntimeVersion for WasmExecutor { | |||
true, | |||
"Core_version", | |||
&[], | |||
DEFAULT_HEAP_PAGES_CONSENSUS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be hardcoded to some lower value (just reading a version)?
Or can be the worst idea as instance could not be reuse 🤔
max_consensus_runtime_instances: usize, | ||
/// Same as `consensus_runtimes`, but used in offchain code context. Their size is always bound | ||
/// by 1 (can easily be configurable, but we don't need this right now). | ||
offchain_runtimes: Mutex<[Option<Arc<VersionedRuntime>>; MAX_RUNTIMES]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess offchain is still using same wasm as the runtime.
So we can use twice as much memory of instantiated wasm now, I guess it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it make sense to have MAX_RUNTIMES_OFFCHAIN. Did we consider only one with two instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afterthought: one instance would be really bad. I'm online being potentially stuck behind phragmen.
// this must be released prior to calling f. | ||
let mut runtimes = match runtime_code.context { | ||
sp_core::traits::CodeContext::Consensus => self.consensus_runtimes.lock(), | ||
sp_core::traits::CodeContext::Offchain => self.offchain_runtimes.lock(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So offchain could not use consensus instance, since heappage differs I guess it would not be a good idea to try.
I do like that there is a clean separation.
let version = WasmOverride::runtime_version(&executor, &wasm, Some(128)) | ||
.expect("should get the `RuntimeVersion` of the test-runtime wasm blob"); | ||
let version = | ||
WasmOverride::runtime_version(&executor, &wasm, Some(128), CodeContext::Consensus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does wasm override apply to offchain?
If so will it run with consensus context?
/// Default num of pages for the heap of the runtime, when used for offchain operations. | ||
/// | ||
/// 256mb per instance. | ||
const DEFAULT_HEAP_PAGES_OFFCHAIN: u64 = DEFAULT_HEAP_PAGES_CONSENSUS * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having bigger requirement for offchain is a bit awkward.
I mean most offchain process should be light (I am only aware of 'I am online', but its requirement are certainly small).
So we use a big mem profile for offchain phragmen mostly.
This makes me think that offchain worker default heap page could depend on the offchain method being call.
Same if considering overriding from cli.
(I am not sure the pool allow switching instance depending on heap page, but could also use a single use instance for this special case)
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions. |
we will probably kill this in favour of whatever @bkchr is cooking. |
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions. |
What @kianenigma said :P |
Wen dynamic heap pages @bkchr? |
++++++++++ |
honestly, I still come back to this and think it is a good interim feature to have. WDYT @paritytech/sdk-node? |
🙈 |
The
Just for your reference, here's the memory map of a current-ish Kusama runtime I have laying around:
So basically any dynamic allocation made inside of the runtime is served by that Now, our runtime doesn't do this, but the Of course this is not very convenient, since it's not automatic. The allocator host function will panic outside of the runtime when running out of memory instead of automatically growing the heap, and you need to grow the heap from within the runtime, so essentially you'd have to either blindly grow the memory or somehow know that the next allocation will fail. We could easily modify the allocator to automatically try to grow the memory if we wanted to though.
TLDR: Technically it's already here due to how the system works, although it's not very convenient, and it's debatable whether it's intended to be actually used or not... 🤷 |
Point here being that I have a branch that has all this already prepared. I just need to clean it up and fix two tests 🙈 |
@koute thanks for the explainer, and I think you basically unwrapped what basti has implemented in his head and WIP branch. Please keep me up to date with further changes regarding this, and do bear in mind that allowing some OCW to allocate more memory (both If you happen to know another workaround to achieve this in the short term, please do let me know. |
FYI, we're still running into this in a semi-random basis on the offchain workers of westned, in a rather unpredictable fashion: https://grafana.parity-mgmt.parity.io/goto/DF1cyNX7z?orgId=1 |
previous
ExexutionStrategy
is now wrapped inExecutionConfig
, which is the strategy and aExecutionContext
. This propagated to a lot ofstrategy -> config
replacements. I like the outcome.polkadot companion: Companion for /substrate/pull/8885 polkadot#3378