-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading NativeArtifact can be faster #2180
Comments
Thank you for the proposal. That's an interesting thing to try. For the moment, we use Are you willing to try something with a PR :-)? |
@Hywan I believe @ailisp is mentioning not dlopen loading time issues, but bincode deserialization of @ailisp, I think your suggestion is really great. Once the PR #2183 is merged it should be really trivial to add a much faster deserialization mechanism. I'm happy with using |
2183: Passive elements and data are now PrimaryMaps r=syrusakbary a=syrusakbary <!-- Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test: https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests --> # Description This PR moves some fields in the `ModuleInfo` to stop using `HashMap` and start using `PrimaryMap` as the contents are not sparse but fully dense. The PR is incentivized by #2180 as it's partially needed to stop using HashMap when possible. It doesn't completely solve it, as the `function_names` field is still a `HashMap`, but it should be trivial to solve that in a PR that improves the deserialization time. <!-- Provide details regarding the change including motivation, links to related issues, and the context of the PR. --> # Review - [ ] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Syrus Akbary <me@syrusakbary.com>
@Hywan I mean @syrusakbary said. Current dllopen is fast enough but thanks for giving a lot more insight in dllopen and that's quite advanced :) @syrusakbary flatdata (or other mmap based, raw pointer based) are usually a single operation to reconstruct the entire struct, which is significantly faster than borsh. Borsh is 30%-40% faster than bincode in deserialize Artifact of |
Some update on this: flatdata actually doesn't support Vec<Vec<>> kind of struct, my above benchmark is on |
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp <!-- Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test: https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests --> # Description <!-- Provide details regarding the change including motivation, links to related issues, and the context of the PR. --> As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run ``` cargo bench -p wasmer-cache --features singlepass ``` from root dir, will see a lot of timeing in the end: ``` ... 3.447994ms // bincode::deserialize rkyv 22ns rkyv deser 386.539µs ``` This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. TODO: - [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap - ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~ ``` enum ModuleMetadataWrap { Original(ModuleMetadata), Archived(ArchivedModuleMetadata) } ``` ~and field access to `ModuleMetadataWrap` method calls~ - [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize - [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test. - [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique # Review - [ ] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Bo Yao <bo@nearprotocol.com>
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp <!-- Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test: https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests --> # Description <!-- Provide details regarding the change including motivation, links to related issues, and the context of the PR. --> As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run ``` cargo bench -p wasmer-cache --features singlepass ``` from root dir, will see a lot of timeing in the end: ``` ... 3.447994ms // bincode::deserialize rkyv 22ns rkyv deser 386.539µs ``` This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. TODO: - [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap - ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~ ``` enum ModuleMetadataWrap { Original(ModuleMetadata), Archived(ArchivedModuleMetadata) } ``` ~and field access to `ModuleMetadataWrap` method calls~ - [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize - [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test. - [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique # Review - [ ] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Bo Yao <bo@nearprotocol.com>
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp <!-- Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test: https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests --> # Description <!-- Provide details regarding the change including motivation, links to related issues, and the context of the PR. --> As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run ``` cargo bench -p wasmer-cache --features singlepass ``` from root dir, will see a lot of timeing in the end: ``` ... 3.447994ms // bincode::deserialize rkyv 22ns rkyv deser 386.539µs ``` This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. TODO: - [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap - ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~ ``` enum ModuleMetadataWrap { Original(ModuleMetadata), Archived(ArchivedModuleMetadata) } ``` ~and field access to `ModuleMetadataWrap` method calls~ - [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize - [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test. - [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique # Review - [ ] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Bo Yao <bo@nearprotocol.com>
Closed by #2190. |
Thanks for proposing a new feature!
Motivation
Currently loading a NativeArtifact is two step:
A simple print instant::now/elapsed() show the total loading takes 90% (3.5ms of 4ms) of the entire NativeArtifact::deserialization_from_file_unchecked, and longer when wasm file is bigger. So this is suboptimal
Proposed solution
ModuleMetadata can also be replaced with a mem mapped data structure, as it's consist of struct or struct of vectors (primarymap). For example, I tried this lib: https://github.com/heremaps/flatdata/tree/master/flatdata-rs and loads a artificially constructed, similar structured and similar sized to wasmer Artifact, it takes only 0.2ms to load, and doesn't take longer when I enlarge size of vector in struct to 100x.
Alternatives
Other mmap based crate maybe considerered
Additional context
There is some HashMap and IndexMap in ModuleInfo, however they are all less than 1000 elements in a 2M contract (lib/tests/assets/qjs.wasm), which should be ok by replace with a sorted vector and lookup via binary search is faster than hashmap in this number of elements
The text was updated successfully, but these errors were encountered: