Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading NativeArtifact can be faster #2180

Closed
ailisp opened this issue Mar 11, 2021 · 5 comments
Closed

Loading NativeArtifact can be faster #2180

ailisp opened this issue Mar 11, 2021 · 5 comments
Labels
🎉 enhancement New feature!

Comments

@ailisp
Copy link
Contributor

ailisp commented Mar 11, 2021

Thanks for proposing a new feature!

Motivation

Currently loading a NativeArtifact is two step:

  1. load ELF, internally it's a mmap, very fast
  2. deserialize ModuleMetadata from part of ELF data, slow

A simple print instant::now/elapsed() show the total loading takes 90% (3.5ms of 4ms) of the entire NativeArtifact::deserialization_from_file_unchecked, and longer when wasm file is bigger. So this is suboptimal

Proposed solution

ModuleMetadata can also be replaced with a mem mapped data structure, as it's consist of struct or struct of vectors (primarymap). For example, I tried this lib: https://github.com/heremaps/flatdata/tree/master/flatdata-rs and loads a artificially constructed, similar structured and similar sized to wasmer Artifact, it takes only 0.2ms to load, and doesn't take longer when I enlarge size of vector in struct to 100x.

Alternatives

Other mmap based crate maybe considerered

Additional context

There is some HashMap and IndexMap in ModuleInfo, however they are all less than 1000 elements in a 2M contract (lib/tests/assets/qjs.wasm), which should be ok by replace with a sorted vector and lookup via binary search is faster than hashmap in this number of elements

@ailisp ailisp added the 🎉 enhancement New feature! label Mar 11, 2021
@Hywan
Copy link
Contributor

Hywan commented Mar 12, 2021

Thank you for the proposal. That's an interesting thing to try.

For the moment, we use libloading to load the library. Behind the scene, it calls dlopen to open the library, with the flags RTLD_LAZY | RTLD_LOCAL. Maybe we can do better. I know RTLD_NOLOAD and RTLD_NODELETE can improve things quite a lot but not sure it will impact our generated shared library objects as they don't have dependencies. Need to test!

Are you willing to try something with a PR :-)?

@syrusakbary
Copy link
Member

@Hywan I believe @ailisp is mentioning not dlopen loading time issues, but bincode deserialization of ModuleMetadata when the artifact is partially loaded (already dlopen-ed).

@ailisp, I think your suggestion is really great. Once the PR #2183 is merged it should be really trivial to add a much faster deserialization mechanism. I'm happy with using flatdata-rs that you commented. But I'm also curious how it will perform also with borsh-rs.

bors bot added a commit that referenced this issue Mar 12, 2021
2183: Passive elements and data are now PrimaryMaps r=syrusakbary a=syrusakbary

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description

This PR moves some fields in the `ModuleInfo` to stop using `HashMap` and start using `PrimaryMap` as the contents are not sparse but fully dense.
The PR is incentivized by #2180 as it's partially needed to stop using HashMap when possible. It doesn't completely solve it, as the `function_names` field is still a `HashMap`, but it should be trivial to solve that in a PR that improves the deserialization time.

<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Syrus Akbary <me@syrusakbary.com>
@ailisp
Copy link
Contributor Author

ailisp commented Mar 12, 2021

@Hywan I mean @syrusakbary said. Current dllopen is fast enough but thanks for giving a lot more insight in dllopen and that's quite advanced :)

@syrusakbary flatdata (or other mmap based, raw pointer based) are usually a single operation to reconstruct the entire struct, which is significantly faster than borsh. Borsh is 30%-40% faster than bincode in deserialize Artifact of qjs.wasm, but it still have to iterate all elements to reconstruct PrimaryMap/Vectors.

@ailisp
Copy link
Contributor Author

ailisp commented Mar 16, 2021

Some update on this: flatdata actually doesn't support Vec<Vec<>> kind of struct, my above benchmark is on CompiledFunctionFrameInfo which is rather flat, but for the entire struct there is PrimaryMap<String,>> which is considered Vec<Vec<>> is ram.
So we found and benched another, zero cost loading crate https://github.com/djkoloski/rkyv, which also gives 0.2ms load time in a 100x enlarged struct (borsh takes 140ms, bincode is even longer), and this crate support HashMap. It's going to need implement rkyv traits for IndexMap and PrimaryMap, are you interested and okay with it?

bors bot added a commit that referenced this issue Apr 21, 2021
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description
<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->

As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run
```
cargo bench -p wasmer-cache --features singlepass
```
from root dir, will see a lot of timeing in the end:
```
...
3.447994ms // bincode::deserialize
rkyv 22ns
rkyv deser 386.539µs
```

This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. 
TODO:
- [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap
- ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~
```
enum ModuleMetadataWrap {
    Original(ModuleMetadata),
    Archived(ArchivedModuleMetadata)
}
```
~and field access to `ModuleMetadataWrap` method calls~
- [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize
- [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test.
- [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Bo Yao <bo@nearprotocol.com>
bors bot added a commit that referenced this issue Apr 21, 2021
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description
<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->

As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run
```
cargo bench -p wasmer-cache --features singlepass
```
from root dir, will see a lot of timeing in the end:
```
...
3.447994ms // bincode::deserialize
rkyv 22ns
rkyv deser 386.539µs
```

This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. 
TODO:
- [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap
- ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~
```
enum ModuleMetadataWrap {
    Original(ModuleMetadata),
    Archived(ArchivedModuleMetadata)
}
```
~and field access to `ModuleMetadataWrap` method calls~
- [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize
- [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test.
- [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Bo Yao <bo@nearprotocol.com>
bors bot added a commit that referenced this issue Apr 21, 2021
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description
<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->

As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run
```
cargo bench -p wasmer-cache --features singlepass
```
from root dir, will see a lot of timeing in the end:
```
...
3.447994ms // bincode::deserialize
rkyv 22ns
rkyv deser 386.539µs
```

This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. 
TODO:
- [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap
- ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~
```
enum ModuleMetadataWrap {
    Original(ModuleMetadata),
    Archived(ArchivedModuleMetadata)
}
```
~and field access to `ModuleMetadataWrap` method calls~
- [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize
- [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test.
- [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Bo Yao <bo@nearprotocol.com>
@Hywan
Copy link
Contributor

Hywan commented Jul 16, 2021

Closed by #2190.

@Hywan Hywan closed this as completed Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎉 enhancement New feature!
Projects
None yet
Development

No branches or pull requests

3 participants