forked from bytecodealliance/wasm-tools
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
wasmparser: Perform type canonicalization for Wasm GC
The unit of canonicalization is a recursion group. Having "unnecessary" types in a recursion group can "break" canonicalization of other types within that same recursion group, as can reordering types within a recursion group. It is an invariant that all types defined before the recursion group we are currently canonicalizing have already been canonicalized themselves. Canonicalizing a recursion group then proceeds as follows: * First we walk each of its `SubType` elements and put their type references (i.e. their `PackedIndex`es) into canonical form. Canonicalizing a `PackedIndex` means switching it from indexing into the Wasm module's types space into either 1. Referencing an already-canonicalized type, for types outside of this recursion group. Because inter-group type references can only go towards types defined before this recursion group, we know the type is already canonicalized and we have a `CoreTypeId` for each of those types. This updates the `PackedIndex` into a `CoreTypeId`. 2. Indexing into the current recursion group, for intra-group type references. Note that (2) has the effect of making the "same" structure of mutual type recursion look identical across recursion groups: ;; Before (rec (struct (field (module-type 1))) (struct (field (module-type 0)))) (rec (struct (field (module-type 3))) (struct (field (module-type 2)))) ;; After (rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0)))) (rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0)))) * Now that the recursion group's elements are in canonical form, we can "simply" hash cons whole rec groups at a time. The `TypesList` morally maintains a hash map from `Vec<SubType>` to `RecGroupId` and we can do get-or-create operations on it. I say "morally" because we don't actually duplicate the `Vec<SubType>` key in that hash map since those elements are already stored in the `TypeList`'s internal `SnapshotList<CoreType>`. This means we need to do some low-level hash table fiddling with the `hashbrown` crate. And that's it! That is the whole canonicalization algorithm. Some more random things to note: * Because we essentially already have to do the check to canonicalize, and to avoid additional passes over the types, the canonicalization pass also checks that type references are in bounds. These are the only errors that can be returned from canonicalization. * Canonicalizing requires the `Module` to translate type indices to actual `CoreTypeId`s. * It is important that *after* we have canonicalized all types, we don't need the `Module` anymore. This makes sure that we can, for example, intern all types from the same store into the same `TypeList`. Which in turn lets us type check function imports of a same-store instance's exported functions and we don't need to translate from one module's canonical representation to another module's canonical representation or perform additional expensive checks to see if the types match or not (since the whole point of canonicalization is to avoid that!). ------------------------------------- I initially tried to have two different Rust types for each Wasm core type (`SubType`, `FuncType`, etc...): one for the version that contains raw type space indices that are produced directly from the reader and another that contains `CoreTypeId`s after canonicalization. This approach is essentially what we do for component model types. However, this was getting really painful, because even `ValType` would have to have two different versions. The amount of places I was touching, including in downstream crates, was getting out of hand. So instead I opted to make a new index type that is morally the following enum: ```rust enum Index { ModuleTypesSpaceIndex(u32), RecGroupLocalIndex(u32), CoreTypeId(CoreTypeId), } ``` Of course, we have to be very frugal with bits to keep `RefType` fitting in 24 bits and `ValType` in 32 bits, so it is actually a bit-packed version of that. We can still represent the maximum number of Wasm types in a module. However a `TypeList` can only have `2 * MAX_WASM_TYPES` stored in it now (or at least that is how many are addressable; you could add more and then never stuff their `CoreTypeId`s into these bit-packed indices). We could free up some more bits here if we started bit-packing `ValType`, but the loss in ergonomics of matching on `ValType` would be pretty bad. Anyways, I also added an unpacked version of these bit-packed indices for ergonomics. The bit-packed version can infallibly be converted to the unpacked version, and the unpacked version can fallibly be converted to the bit-packed version (it checks that the indices are representable in the number of bits we actually have available). Finally, because we are back to only having a single Rust type for each core Wasm type, I removed the `define_core_wasm_types!` macro and inlined the definitions. Sorry for the churn! But it is definitely nicer not having them inside a macro at the end of the day. ----------------------------------- This also fixes bytecodealliance#923, since canonicalization avoids the exponential behavior observed there.
- Loading branch information
Showing
27 changed files
with
2,457 additions
and
1,276 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.