refactor: streamline contract caching code #7851

matklad · 2022-10-17T19:00:20Z

This does two rather big code motions at the same time (sorry!):

precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing precompile functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation
code to deal with our two-layered caching is moved from cache.rs into impls of VM

matklad · 2022-10-17T19:04:13Z

runtime/near-vm-runner/src/cache.rs

-                vm.engine
-                    .load_universal_executable(&executable)
-                    .map(|v| Arc::new(v) as _)
-                    .map_err(|err| panic!("could not load the executable: {}", err.to_string()))


This is very misleading: this is not a map_err, it's effectively an .unwrap_or_else. I think this is correct code, but it is no the non-prod path where the Cache is None.

matklad · 2022-10-17T19:09:13Z

runtime/near-vm-runner/src/cache.rs

-            Ok(artifact) => Ok(Ok(Arc::new(artifact) as _)),
-            Err(err) => {
-                let err = CompilationError::WasmerCompileError { msg: err.to_string() };
-                cache_error(&err, key, cache)?;


And here's the "cached" path counterpart of

https://github.com/near/nearcore/pull/7851/files#r997413310

I change the behavor here to return a CacheError, rather than cache error. I think this is what we should do (ie, the current code on master is buggy). load_universal_executable fails only if we fail to allocate memory, and that's definitely not an error we should cache (@nagisa coudl you double-check this?)

Yeah, I believe load_universal_executable can only fail for obscure resource reasons. The ref variant can additionally fail if the backing serialized storage is corrupted in some way, but that’s also something we shouldn’t be caching I feel.

runtime/near-vm-runner/src/wasmer_runner.rs

runtime/near-vm-runner/src/wasmer2_runner.rs

nagisa · 2022-10-18T10:18:32Z

runtime/near-vm-runner/src/cache.rs

-            Ok(artifact) => Ok(Ok(Arc::new(artifact) as _)),
-            Err(err) => {
-                let err = CompilationError::WasmerCompileError { msg: err.to_string() };
-                cache_error(&err, key, cache)?;


Yeah, I believe load_universal_executable can only fail for obscure resource reasons. The ref variant can additionally fail if the backing serialized storage is corrupted in some way, but that’s also something we shouldn’t be caching I feel.

nagisa · 2022-10-18T10:20:13Z

runtime/near-vm-runner/src/wasmer2_runner.rs

+                let stored_artifact: Option<VMArtifact> = match cache_record {
+                    None => None,
+                    Some(CompiledContract::CompileModuleError(err)) => return Ok(Err(err)),
+                    Some(CompiledContract::Code(serialized_module)) => {
+                        unsafe {
+                            // (UN-)SAFETY: the `serialized_module` must have been produced by a prior call to
+                            // `serialize`.
+                            //
+                            // In practice this is not necessarily true. One could have forgotten to change the
+                            // cache key when upgrading the version of the wasmer library or the database could
+                            // have had its data corrupted while at rest.
+                            //
+                            // There should definitely be some validation in wasmer to ensure we load what we think
+                            // we load.
+                            let executable =
+                                UniversalExecutableRef::deserialize(&serialized_module)
+                                    .map_err(|_| CacheError::DeserializationError)?;
+                            let artifact = self
+                                .engine
+                                .load_universal_executable_ref(&executable)
+                                .map(Arc::new)
+                                .map_err(|_| CacheError::DeserializationError)?;
+                            Some(artifact)
+                        }
+                    }
+                };


This pattern seems to be shared between VMs and feels somewhat error-prone. Can this be abstracted into a function?

Maybe, but I decided not to, at least not in this PR:

smaller conceptual diff

we probably won't be touching wasmer1 code much (eg, we might want change wasmer2 behavior here under protocl feature, but not wasmer1)

with wasmer two, we have separate code paths fro Executable vs ExecutableRef, which I think raises the cost of abstraction here

Yeah, it seems fine to not roll up that into this PR at least...

That said, I was thinking more about the match { None => None, Some(...) => Ok(Err(err)), _ => whatever } part. I was thinking that the matching on the structure could go away (much like e.g. map_err/transpose/etc convenience methods do for Result) and the VM-specific portion could be just a closure or some such thing.

Ah, I see, yeah, we can I guess add some helper to CompiledContractCache, will note in the relevant issue

runtime/near-vm-runner/src/wasmer2_runner.rs

matklad · 2022-10-18T13:06:17Z

@jakmeier could you take a look here? This is super tricky (though hopefully less than it used to be), I'd love some extra hand-holding! (and feel free to slap auto-merge if it looks ok)

matklad · 2022-10-18T13:45:39Z

Heh, the test failure here is an prime example why global state is bad ^^

The failing tests creates two runtimes, and checks that contacts are properly compiled when one runtime syncs two another. The two runtimes have separate db-level caches (b/c they are explicitly passed in), but share the same in-memory cache (b/c its a static, duh).

This does two rather big code motions at the same time (sorry!): * precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation * code to deal with our two-layered caching is moved from `cache.rs` into impls of VM

This happens either due to resource exhaustion, or when zero-copy deserialized data is invalid

This fixes `test_two_deployments`. The fix is to ensure that `precompile` *only* compiles and caches the contract, without loading it. This reshuffles error yet again, as `LoadingError` moves from `CacheError` to `VMRunnerError`, where it actually belongs.

jakmeier

Yeah, this all looks good to me. Definitely like the changes here for code readibilty!

This does two rather big code motions at the same time (sorry!): * precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation * code to deal with our two-layered caching is moved from `cache.rs` into impls of VM

matklad requested review from nagisa and jakmeier October 17, 2022 19:00

matklad mentioned this pull request Oct 17, 2022

refactor: move UnsupportedCompiler to errors #7849

Merged

matklad commented Oct 17, 2022

View reviewed changes

nagisa reviewed Oct 18, 2022

View reviewed changes

matklad force-pushed the m/cache branch from 741b939 to e12d9fa Compare October 18, 2022 11:53

matklad marked this pull request as ready for review October 18, 2022 11:56

matklad requested a review from a team as a code owner October 18, 2022 11:56

nagisa approved these changes Oct 18, 2022

View reviewed changes

matklad mentioned this pull request Oct 18, 2022

VM Errors follow ups #7818

Open

13 tasks

matklad added 7 commits October 18, 2022 15:19

emit dedicated error when loading a contract fails

672f245

This happens either due to resource exhaustion, or when zero-copy deserialized data is invalid

features

29a7937

comments and spans

ae3f81f

fix estimator

b3e39ed

fix: don't load the code during precompilation

a11ff16

matklad force-pushed the m/cache branch from 3801e5f to e143a8b Compare October 18, 2022 14:19

jakmeier approved these changes Oct 18, 2022

View reviewed changes

Merge branch 'master' into m/cache

1b02e39

jakmeier added the S-automerge label Oct 18, 2022

near-bulldozer bot merged commit 650e210 into near:master Oct 18, 2022

matklad deleted the m/cache branch October 18, 2022 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: streamline contract caching code #7851

refactor: streamline contract caching code #7851

matklad commented Oct 17, 2022

matklad Oct 17, 2022

matklad Oct 17, 2022

nagisa Oct 18, 2022

nagisa Oct 18, 2022

nagisa Oct 18, 2022

matklad Oct 18, 2022

nagisa Oct 18, 2022

nagisa Oct 18, 2022

matklad Oct 18, 2022

matklad commented Oct 18, 2022

matklad commented Oct 18, 2022

jakmeier left a comment

refactor: streamline contract caching code #7851

refactor: streamline contract caching code #7851

Conversation

matklad commented Oct 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matklad commented Oct 18, 2022

matklad commented Oct 18, 2022

jakmeier left a comment

Choose a reason for hiding this comment