Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: streamline contract caching code #7851

Merged
merged 8 commits into from
Oct 18, 2022
Merged

Conversation

matklad
Copy link
Contributor

@matklad matklad commented Oct 17, 2022

This does two rather big code motions at the same time (sorry!):

  • precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing precompile functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation
  • code to deal with our two-layered caching is moved from cache.rs into impls of VM

vm.engine
.load_universal_executable(&executable)
.map(|v| Arc::new(v) as _)
.map_err(|err| panic!("could not load the executable: {}", err.to_string()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very misleading: this is not a map_err, it's effectively an .unwrap_or_else. I think this is correct code, but it is no the non-prod path where the Cache is None.

Ok(artifact) => Ok(Ok(Arc::new(artifact) as _)),
Err(err) => {
let err = CompilationError::WasmerCompileError { msg: err.to_string() };
cache_error(&err, key, cache)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here's the "cached" path counterpart of

https://github.com/near/nearcore/pull/7851/files#r997413310

I change the behavor here to return a CacheError, rather than cache error. I think this is what we should do (ie, the current code on master is buggy). load_universal_executable fails only if we fail to allocate memory, and that's definitely not an error we should cache (@nagisa coudl you double-check this?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe load_universal_executable can only fail for obscure resource reasons. The ref variant can additionally fail if the backing serialized storage is corrupted in some way, but that’s also something we shouldn’t be caching I feel.

runtime/near-vm-runner/src/wasmer_runner.rs Outdated Show resolved Hide resolved
runtime/near-vm-runner/src/wasmer2_runner.rs Outdated Show resolved Hide resolved
Ok(artifact) => Ok(Ok(Arc::new(artifact) as _)),
Err(err) => {
let err = CompilationError::WasmerCompileError { msg: err.to_string() };
cache_error(&err, key, cache)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe load_universal_executable can only fail for obscure resource reasons. The ref variant can additionally fail if the backing serialized storage is corrupted in some way, but that’s also something we shouldn’t be caching I feel.

Comment on lines 306 to 329
let stored_artifact: Option<VMArtifact> = match cache_record {
None => None,
Some(CompiledContract::CompileModuleError(err)) => return Ok(Err(err)),
Some(CompiledContract::Code(serialized_module)) => {
unsafe {
// (UN-)SAFETY: the `serialized_module` must have been produced by a prior call to
// `serialize`.
//
// In practice this is not necessarily true. One could have forgotten to change the
// cache key when upgrading the version of the wasmer library or the database could
// have had its data corrupted while at rest.
//
// There should definitely be some validation in wasmer to ensure we load what we think
// we load.
let executable =
UniversalExecutableRef::deserialize(&serialized_module)
.map_err(|_| CacheError::DeserializationError)?;
let artifact = self
.engine
.load_universal_executable_ref(&executable)
.map(Arc::new)
.map_err(|_| CacheError::DeserializationError)?;
Some(artifact)
}
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern seems to be shared between VMs and feels somewhat error-prone. Can this be abstracted into a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I decided not to, at least not in this PR:

  • smaller conceptual diff
  • we probably won't be touching wasmer1 code much (eg, we might want change wasmer2 behavior here under protocl feature, but not wasmer1)
  • with wasmer two, we have separate code paths fro Executable vs ExecutableRef, which I think raises the cost of abstraction here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems fine to not roll up that into this PR at least...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I was thinking more about the match { None => None, Some(...) => Ok(Err(err)), _ => whatever } part. I was thinking that the matching on the structure could go away (much like e.g. map_err/transpose/etc convenience methods do for Result) and the VM-specific portion could be just a closure or some such thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, yeah, we can I guess add some helper to CompiledContractCache, will note in the relevant issue

runtime/near-vm-runner/src/wasmer2_runner.rs Outdated Show resolved Hide resolved
@matklad matklad marked this pull request as ready for review October 18, 2022 11:56
@matklad matklad requested a review from a team as a code owner October 18, 2022 11:56
@matklad
Copy link
Contributor Author

matklad commented Oct 18, 2022

@jakmeier could you take a look here? This is super tricky (though hopefully less than it used to be), I'd love some extra hand-holding! (and feel free to slap auto-merge if it looks ok)

@matklad matklad mentioned this pull request Oct 18, 2022
13 tasks
@matklad
Copy link
Contributor Author

matklad commented Oct 18, 2022

Heh, the test failure here is an prime example why global state is bad ^^

The failing tests creates two runtimes, and checks that contacts are properly compiled when one runtime syncs two another. The two runtimes have separate db-level caches (b/c they are explicitly passed in), but share the same in-memory cache (b/c its a static, duh).

This does two rather big code motions at the same time (sorry!):

* precompilation code is moved from "let's match on VMKind" style to
  "let's call Runtime's virtual method". To do so, we re-purpose
  existing `precompile` functions in the trait. Remember, those fns are
  essentially dead code from "pipelined" compilation
* code to deal with our two-layered caching is moved from `cache.rs`
  into impls of VM
This happens either due to resource exhaustion, or when zero-copy
deserialized data is invalid
This fixes `test_two_deployments`.

The fix is to ensure that `precompile` *only* compiles and caches the
contract, without loading it. This reshuffles error yet again, as
`LoadingError` moves from `CacheError` to `VMRunnerError`, where it
actually belongs.
Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this all looks good to me. Definitely like the changes here for code readibilty!

@near-bulldozer near-bulldozer bot merged commit 650e210 into near:master Oct 18, 2022
@matklad matklad deleted the m/cache branch October 18, 2022 16:10
nikurt pushed a commit that referenced this pull request Oct 19, 2022
This does two rather big code motions at the same time (sorry!):

* precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation
* code to deal with our two-layered caching is moved from `cache.rs` into impls of VM
nikurt pushed a commit that referenced this pull request Nov 9, 2022
This does two rather big code motions at the same time (sorry!):

* precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation
* code to deal with our two-layered caching is moved from `cache.rs` into impls of VM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants