Skip to content

Commit

Permalink
wasmparser: Perform type canonicalization for Wasm GC
Browse files Browse the repository at this point in the history
The unit of canonicalization is a recursion group. Having "unnecessary" types in
a recursion group can "break" canonicalization of other types within that same
recursion group, as can reordering types within a recursion group.

It is an invariant that all types defined before the recursion group we are
currently canonicalizing have already been canonicalized themselves.

Canonicalizing a recursion group then proceeds as follows:

* First we walk each of its `SubType` elements and put their type references
  (i.e. their `PackedIndex`es) into canonical form. Canonicalizing a
  `PackedIndex` means switching it from indexing into the Wasm module's types
  space into either

  1. Referencing an already-canonicalized type, for types outside of this
     recursion group. Because inter-group type references can only go towards
     types defined before this recursion group, we know the type is already
     canonicalized and we have a `CoreTypeId` for each of those types. This
     updates the `PackedIndex` into a `CoreTypeId`.

  2. Indexing into the current recursion group, for intra-group type references.

  Note that (2) has the effect of making the "same" structure of mutual type
  recursion look identical across recursion groups:

      ;; Before
      (rec (struct (field (module-type 1))) (struct (field (module-type 0))))
      (rec (struct (field (module-type 3))) (struct (field (module-type 2))))

      ;; After
      (rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0))))
      (rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0))))

* Now that the recursion group's elements are in canonical form, we can "simply"
  hash cons whole rec groups at a time. The `TypesList` morally maintains a hash
  map from `Vec<SubType>` to `RecGroupId` and we can do get-or-create operations
  on it. I say "morally" because we don't actually duplicate the `Vec<SubType>`
  key in that hash map since those elements are already stored in the
  `TypeList`'s internal `SnapshotList<CoreType>`. This means we need to do some
  low-level hash table fiddling with the `hashbrown` crate.

And that's it! That is the whole canonicalization algorithm.

Some more random things to note:

* Because we essentially already have to do the check to canonicalize, and to
  avoid additional passes over the types, the canonicalization pass also checks
  that type references are in bounds. These are the only errors that can be
  returned from canonicalization.

* Canonicalizing requires the `Module` to translate type indices to actual
  `CoreTypeId`s.

* It is important that *after* we have canonicalized all types, we don't need
  the `Module` anymore. This makes sure that we can, for example, intern all
  types from the same store into the same `TypeList`. Which in turn lets us type
  check function imports of a same-store instance's exported functions and we
  don't need to translate from one module's canonical representation to another
  module's canonical representation or perform additional expensive checks to
  see if the types match or not (since the whole point of canonicalization is to
  avoid that!).

-------------------------------------

I initially tried to have two different Rust types for each Wasm core type
(`SubType`, `FuncType`, etc...): one for the version that contains raw type
space indices that are produced directly from the reader and another that
contains `CoreTypeId`s after canonicalization. This approach is essentially what
we do for component model types. However, this was getting really painful,
because even `ValType` would have to have two different versions. The amount of
places I was touching, including in downstream crates, was getting out of hand.

So instead I opted to make a new index type that is morally the following enum:

```rust
enum Index {
    ModuleTypesSpaceIndex(u32),
    RecGroupLocalIndex(u32),
    CoreTypeId(CoreTypeId),
}
```

Of course, we have to be very frugal with bits to keep `RefType` fitting in 24
bits and `ValType` in 32 bits, so it is actually a bit-packed version of
that. We can still represent the maximum number of Wasm types in a
module. However a `TypeList` can only have `2 * MAX_WASM_TYPES` stored in it now
(or at least that is how many are addressable; you could add more and then never
stuff their `CoreTypeId`s into these bit-packed indices). We could free up some
more bits here if we started bit-packing `ValType`, but the loss in ergonomics
of matching on `ValType` would be pretty bad.

Anyways, I also added an unpacked version of these bit-packed indices for
ergonomics. The bit-packed version can infallibly be converted to the unpacked
version, and the unpacked version can fallibly be converted to the bit-packed
version (it checks that the indices are representable in the number of bits we
actually have available).

Finally, because we are back to only having a single Rust type for each core
Wasm type, I removed the `define_core_wasm_types!` macro and inlined the
definitions. Sorry for the churn! But it is definitely nicer not having them
inside a macro at the end of the day.

-----------------------------------

This also fixes bytecodealliance#923, since canonicalization avoids the exponential behavior
observed there.
  • Loading branch information
fitzgen committed Nov 1, 2023
1 parent 706f755 commit 23cd7ff
Show file tree
Hide file tree
Showing 27 changed files with 2,457 additions and 1,276 deletions.
10 changes: 7 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion crates/wasm-compose/src/encoding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,9 @@ impl<'a> TypeEncoder<'a> {
wasmparser::HeapType::Struct => HeapType::Struct,
wasmparser::HeapType::Array => HeapType::Array,
wasmparser::HeapType::I31 => HeapType::I31,
wasmparser::HeapType::Concrete(i) => HeapType::Concrete(i),
wasmparser::HeapType::Concrete(i) => {
HeapType::Concrete(i.as_module_index().unwrap())
}
},
}
}
Expand Down
16 changes: 14 additions & 2 deletions crates/wasm-encoder/src/core/code.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2893,6 +2893,11 @@ pub enum ConstExprConversionError {
/// The const expression is invalid: not actually constant or something like
/// that.
Invalid,

/// There was a type reference that was canonicalized and no longer
/// references an index into a module's types space, so we cannot encode it
/// into a Wasm binary again.
CanonicalizedTypeReference,
}

#[cfg(feature = "wasmparser")]
Expand All @@ -2903,6 +2908,10 @@ impl std::fmt::Display for ConstExprConversionError {
write!(f, "There was an error when parsing the const expression")
}
Self::Invalid => write!(f, "The const expression was invalid"),
Self::CanonicalizedTypeReference => write!(
f,
"There was a canonicalized type reference without type index information"
),
}
}
}
Expand All @@ -2912,7 +2921,7 @@ impl std::error::Error for ConstExprConversionError {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
match self {
Self::ParseError(e) => Some(e),
Self::Invalid => None,
Self::Invalid | Self::CanonicalizedTypeReference => None,
}
}
}
Expand All @@ -2936,7 +2945,10 @@ impl<'a> TryFrom<wasmparser::ConstExpr<'a>> for ConstExpr {
Some(Ok(wasmparser::Operator::V128Const { value })) => {
ConstExpr::v128_const(i128::from_le_bytes(*value.bytes()))
}
Some(Ok(wasmparser::Operator::RefNull { hty })) => ConstExpr::ref_null(hty.into()),
Some(Ok(wasmparser::Operator::RefNull { hty })) => ConstExpr::ref_null(
HeapType::try_from(hty)
.map_err(|_| ConstExprConversionError::CanonicalizedTypeReference)?,
),
Some(Ok(wasmparser::Operator::RefFunc { function_index })) => {
ConstExpr::ref_func(function_index)
}
Expand Down
11 changes: 6 additions & 5 deletions crates/wasm-encoder/src/core/globals.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,12 @@ impl Encode for GlobalType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::GlobalType> for GlobalType {
fn from(global_ty: wasmparser::GlobalType) -> Self {
GlobalType {
val_type: global_ty.content_type.into(),
impl TryFrom<wasmparser::GlobalType> for GlobalType {
type Error = ();
fn try_from(global_ty: wasmparser::GlobalType) -> Result<Self, Self::Error> {
Ok(GlobalType {
val_type: global_ty.content_type.try_into()?,
mutable: global_ty.mutable,
}
})
}
}
13 changes: 7 additions & 6 deletions crates/wasm-encoder/src/core/imports.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,15 +74,16 @@ impl From<TagType> for EntityType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::TypeRef> for EntityType {
fn from(type_ref: wasmparser::TypeRef) -> Self {
match type_ref {
impl TryFrom<wasmparser::TypeRef> for EntityType {
type Error = ();
fn try_from(type_ref: wasmparser::TypeRef) -> Result<Self, Self::Error> {
Ok(match type_ref {
wasmparser::TypeRef::Func(i) => EntityType::Function(i),
wasmparser::TypeRef::Table(t) => EntityType::Table(t.into()),
wasmparser::TypeRef::Table(t) => EntityType::Table(t.try_into()?),
wasmparser::TypeRef::Memory(m) => EntityType::Memory(m.into()),
wasmparser::TypeRef::Global(g) => EntityType::Global(g.into()),
wasmparser::TypeRef::Global(g) => EntityType::Global(g.try_into()?),
wasmparser::TypeRef::Tag(t) => EntityType::Tag(t.into()),
}
})
}
}

Expand Down
11 changes: 6 additions & 5 deletions crates/wasm-encoder/src/core/tables.rs
Original file line number Diff line number Diff line change
Expand Up @@ -104,12 +104,13 @@ impl Encode for TableType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::TableType> for TableType {
fn from(table_ty: wasmparser::TableType) -> Self {
TableType {
element_type: table_ty.element_type.into(),
impl TryFrom<wasmparser::TableType> for TableType {
type Error = ();
fn try_from(table_ty: wasmparser::TableType) -> Result<Self, Self::Error> {
Ok(TableType {
element_type: table_ty.element_type.try_into()?,
minimum: table_ty.initial,
maximum: table_ty.maximum,
}
})
}
}
133 changes: 80 additions & 53 deletions crates/wasm-encoder/src/core/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,18 @@ pub struct SubType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::SubType> for SubType {
fn from(sub_ty: wasmparser::SubType) -> Self {
SubType {
impl TryFrom<wasmparser::SubType> for SubType {
type Error = ();

fn try_from(sub_ty: wasmparser::SubType) -> Result<Self, Self::Error> {
Ok(SubType {
is_final: sub_ty.is_final,
supertype_idx: sub_ty.supertype_idx,
composite_type: sub_ty.composite_type.into(),
}
supertype_idx: sub_ty
.supertype_idx
.map(|i| i.as_module_index().ok_or(()))
.transpose()?,
composite_type: sub_ty.composite_type.try_into()?,
})
}
}

Expand Down Expand Up @@ -52,13 +57,14 @@ impl Encode for CompositeType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::CompositeType> for CompositeType {
fn from(composite_ty: wasmparser::CompositeType) -> Self {
match composite_ty {
wasmparser::CompositeType::Func(f) => CompositeType::Func(f.into()),
wasmparser::CompositeType::Array(a) => CompositeType::Array(a.into()),
wasmparser::CompositeType::Struct(s) => CompositeType::Struct(s.into()),
}
impl TryFrom<wasmparser::CompositeType> for CompositeType {
type Error = ();
fn try_from(composite_ty: wasmparser::CompositeType) -> Result<Self, Self::Error> {
Ok(match composite_ty {
wasmparser::CompositeType::Func(f) => CompositeType::Func(f.try_into()?),
wasmparser::CompositeType::Array(a) => CompositeType::Array(a.try_into()?),
wasmparser::CompositeType::Struct(s) => CompositeType::Struct(s.try_into()?),
})
}
}

Expand All @@ -72,12 +78,14 @@ pub struct FuncType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::FuncType> for FuncType {
fn from(func_ty: wasmparser::FuncType) -> Self {
FuncType::new(
func_ty.params().iter().cloned().map(Into::into),
func_ty.results().iter().cloned().map(Into::into),
)
impl TryFrom<wasmparser::FuncType> for FuncType {
type Error = ();
fn try_from(func_ty: wasmparser::FuncType) -> Result<Self, Self::Error> {
let mut buf = Vec::with_capacity(func_ty.params().len() + func_ty.results().len());
for ty in func_ty.params().iter().chain(func_ty.results()).copied() {
buf.push(ty.try_into()?);
}
Ok(FuncType::from_parts(buf.into(), func_ty.params().len()))
}
}

Expand All @@ -86,9 +94,10 @@ impl From<wasmparser::FuncType> for FuncType {
pub struct ArrayType(pub FieldType);

#[cfg(feature = "wasmparser")]
impl From<wasmparser::ArrayType> for ArrayType {
fn from(array_ty: wasmparser::ArrayType) -> Self {
ArrayType(array_ty.0.into())
impl TryFrom<wasmparser::ArrayType> for ArrayType {
type Error = ();
fn try_from(array_ty: wasmparser::ArrayType) -> Result<Self, Self::Error> {
Ok(ArrayType(array_ty.0.try_into()?))
}
}

Expand All @@ -100,11 +109,17 @@ pub struct StructType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::StructType> for StructType {
fn from(struct_ty: wasmparser::StructType) -> Self {
StructType {
fields: struct_ty.fields.iter().cloned().map(Into::into).collect(),
}
impl TryFrom<wasmparser::StructType> for StructType {
type Error = ();
fn try_from(struct_ty: wasmparser::StructType) -> Result<Self, Self::Error> {
Ok(StructType {
fields: struct_ty
.fields
.iter()
.cloned()
.map(TryInto::try_into)
.collect::<Result<_, _>>()?,
})
}
}

Expand All @@ -118,12 +133,13 @@ pub struct FieldType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::FieldType> for FieldType {
fn from(field_ty: wasmparser::FieldType) -> Self {
FieldType {
element_type: field_ty.element_type.into(),
impl TryFrom<wasmparser::FieldType> for FieldType {
type Error = ();
fn try_from(field_ty: wasmparser::FieldType) -> Result<Self, Self::Error> {
Ok(FieldType {
element_type: field_ty.element_type.try_into()?,
mutable: field_ty.mutable,
}
})
}
}

Expand All @@ -139,13 +155,14 @@ pub enum StorageType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::StorageType> for StorageType {
fn from(storage_ty: wasmparser::StorageType) -> Self {
match storage_ty {
impl TryFrom<wasmparser::StorageType> for StorageType {
type Error = ();
fn try_from(storage_ty: wasmparser::StorageType) -> Result<Self, Self::Error> {
Ok(match storage_ty {
wasmparser::StorageType::I8 => StorageType::I8,
wasmparser::StorageType::I16 => StorageType::I16,
wasmparser::StorageType::Val(v) => StorageType::Val(v.into()),
}
wasmparser::StorageType::Val(v) => StorageType::Val(v.try_into()?),
})
}
}

Expand Down Expand Up @@ -173,16 +190,17 @@ pub enum ValType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::ValType> for ValType {
fn from(val_ty: wasmparser::ValType) -> Self {
match val_ty {
impl TryFrom<wasmparser::ValType> for ValType {
type Error = ();
fn try_from(val_ty: wasmparser::ValType) -> Result<Self, Self::Error> {
Ok(match val_ty {
wasmparser::ValType::I32 => ValType::I32,
wasmparser::ValType::I64 => ValType::I64,
wasmparser::ValType::F32 => ValType::F32,
wasmparser::ValType::F64 => ValType::F64,
wasmparser::ValType::V128 => ValType::V128,
wasmparser::ValType::Ref(r) => ValType::Ref(r.into()),
}
wasmparser::ValType::Ref(r) => ValType::Ref(r.try_into()?),
})
}
}

Expand All @@ -196,8 +214,13 @@ impl FuncType {
let mut buffer = params.into_iter().collect::<Vec<_>>();
let len_params = buffer.len();
buffer.extend(results);
Self::from_parts(buffer.into(), len_params)
}

#[inline]
pub(crate) fn from_parts(params_results: Box<[ValType]>, len_params: usize) -> Self {
Self {
params_results: buffer.into(),
params_results,
len_params,
}
}
Expand Down Expand Up @@ -293,12 +316,14 @@ impl Encode for RefType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::RefType> for RefType {
fn from(ref_type: wasmparser::RefType) -> Self {
RefType {
impl TryFrom<wasmparser::RefType> for RefType {
type Error = ();

fn try_from(ref_type: wasmparser::RefType) -> Result<Self, Self::Error> {
Ok(RefType {
nullable: ref_type.is_nullable(),
heap_type: ref_type.heap_type().into(),
}
heap_type: ref_type.heap_type().try_into()?,
})
}
}

Expand Down Expand Up @@ -381,10 +406,12 @@ impl Encode for HeapType {
}

#[cfg(feature = "wasmparser")]
impl From<wasmparser::HeapType> for HeapType {
fn from(heap_type: wasmparser::HeapType) -> Self {
match heap_type {
wasmparser::HeapType::Concrete(i) => HeapType::Concrete(i),
impl TryFrom<wasmparser::HeapType> for HeapType {
type Error = ();

fn try_from(heap_type: wasmparser::HeapType) -> Result<Self, Self::Error> {
Ok(match heap_type {
wasmparser::HeapType::Concrete(i) => HeapType::Concrete(i.as_module_index().ok_or(())?),
wasmparser::HeapType::Func => HeapType::Func,
wasmparser::HeapType::Extern => HeapType::Extern,
wasmparser::HeapType::Any => HeapType::Any,
Expand All @@ -395,7 +422,7 @@ impl From<wasmparser::HeapType> for HeapType {
wasmparser::HeapType::Struct => HeapType::Struct,
wasmparser::HeapType::Array => HeapType::Array,
wasmparser::HeapType::I31 => HeapType::I31,
}
})
}
}

Expand Down
2 changes: 1 addition & 1 deletion crates/wasm-mutate/src/module.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ pub fn map_ref_type(ref_ty: wasmparser::RefType) -> Result<RefType> {
wasmparser::HeapType::Struct => HeapType::Struct,
wasmparser::HeapType::Array => HeapType::Array,
wasmparser::HeapType::I31 => HeapType::I31,
wasmparser::HeapType::Concrete(i) => HeapType::Concrete(i.into()),
wasmparser::HeapType::Concrete(i) => HeapType::Concrete(i.as_module_index().unwrap()),
},
})
}
Expand Down
6 changes: 3 additions & 3 deletions crates/wasm-mutate/src/mutators/translate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -210,9 +210,9 @@ pub fn heapty(t: &mut dyn Translator, ty: &wasmparser::HeapType) -> Result<HeapT
wasmparser::HeapType::Struct => Ok(HeapType::Struct),
wasmparser::HeapType::Array => Ok(HeapType::Array),
wasmparser::HeapType::I31 => Ok(HeapType::I31),
wasmparser::HeapType::Concrete(i) => {
Ok(HeapType::Concrete(t.remap(Item::Type, (*i).into())?))
}
wasmparser::HeapType::Concrete(i) => Ok(HeapType::Concrete(
t.remap(Item::Type, i.as_module_index().unwrap())?,
)),
}
}

Expand Down
Loading

0 comments on commit 23cd7ff

Please sign in to comment.