[Parser] Pop past unreachables where possible #6489

tlively · 2024-04-11T02:42:14Z

We previously would eagerly drop all concretely typed expressions on the stack
when pushing an unreachable instruction. This was semantically correct and
closely modeled the semantics of unreachable instructions, which implicitly drop
the entire stack and start a new polymorphic stack. However, it also meant that
the structure of the parsed IR did not match the structure of the folded input,
which meant that tests involving unreachable children would not parse as
intended, preventing the test from testing the intended behavior.

For example, this wat:

  (i32.add
   (i32.const 0)
   (unreachable)
  )

Would previously parse into this IR:

  (drop
   (i32.const 0)
  )
  (i32.add
   (unreachable)
   (unreachable)
  )

To fix this problem, we need to stop eagerly dropping stack values when
encountering an unreachable instruction so we can still pop expressions pushed
before the unreachable as direct children of later instructions. In the example
above, we need to keep the i32.const 0 on the stack so it is available to be
popped and become a child of the i32.add.

However, the naive solution of simply popping past unreachables would produce
invalid IR in some cases. For example, consider this wat:

  f32.const 0
  unreachable
  i32.add

The naive solution would parse this wat into this IR:

  (i32.add
   (f32.const 0)
   (unreachable)
  )

But we do not want to parse an i32.add with an f32-typed child. Neither do
we want to reject this input, since it is a perfectly valid Wasm fragment. In this
case, we actually want the old behavior of dropping the f32.const and
replacing it with another unreachable as the first child of the i32.add.

To both match the input structure where possible and also gracefully fall back
to the old behavior of dropping expressions prior to the unreachable, collect
constraints on the types of each child for each kind of expression and compare
them to the types of available expressions on the stack when an unreachable
instruction will be popped. When the constraints are satisfied, pop expressions
normally, even after popping the unreachable instruction. Otherwise, drop the
instructions that precede the unreachable instruction to ensure we parse valid
IR.

To collect the constraints, add a new ChildTyper utility that calls a
different callback for each kind of possible type constraint for each child. In
the future, this utility can be used to simplify the validator as well.

tlively · 2024-04-11T02:42:34Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlively and the rest of your teammates on Graphite

kripken · 2024-04-11T18:56:26Z

src/ir/child-typer.h

+
+// For each child, call `noteSubtype` with a pointer to the child and the most
+// specific type that the child must have. Subtypings are based only on
+// information that would have been parsed out of a binary.


What does the last sentence here mean? Maybe an example of a subtyping that is somehow not from a binary could help.

I don't remember what a counterexample would be, so I've just gone ahead and removed that sentence.

kripken · 2024-04-11T23:17:19Z

src/ir/child-typer.h

+
+  void noteAnyReference(Expression** childp) {
+    self().noteAnyReferenceType(childp);
+  }


Are these foo() { self.foo() } stubs needed? In CRTP the right thing should just get called anyhow I think - we don't do this elsewhere.

No, they're not needed, but they are convenient for avoiding a bunch of calls to self() everywhere and to provide a shorter internal name.

src/ir/child-typer.h

kripken · 2024-04-11T23:21:17Z

src/ir/child-typer.h

+
+  void visitBreak(Break* curr, std::optional<Type> labelType = std::nullopt) {
+    if (!labelType) {
+      labelType = getLabelType(curr->name);


Why not always do this to get the label's type?

Sometimes it is more convenient to pass a type explicitly to avoid having to maintain a persistent mapping from labels to types that the callback can use.

For the IRBuilder case, getLabelType is not readily implementable, arguably due to a layering issue. IRBuilder uses indices (like in the binary format) rather than names as the source of truth for labels. It can convert label names into label indices, but only for label names that are explicitly provided at the start of a scope, not for label names that are generated on demand because a branch targets an unnamed label. That means that an implementation of this getLabelType callback would fail to find the type of labels with generated names.

Fixing this would not be as simple as letting IRBuilder translated generated label names into indices because the parser currently uses IRBuilder's name-to-index translation to check whether a name is valid. If the translation worked for generated names, the parser would start admitting modules that happen to use the generated names without them being bound. This is arguably a layering violation if you assume the parser should be responsible for tracking its own mapping of label names to indices, but on the other hand the entire point of IRBuilder is to track state so the parser doesn't have to.

The current solution is to have IRBuilder always pass an explicit type rather than depending on the getLabelType callback. Another solution would be to differentiate generated names from explicit names so that we could have different behavior in the public and private mapping of label names to indices.

I see, thanks. Yeah, this seems tricky to fix.

I'd suggest adding a comment summarizing some of what you said here, for the benefit of future readers of the code here (though hopefully we'll find a nice refactoring before long).

kripken · 2024-04-11T23:27:00Z

src/ir/child-typer.h

+      note(&curr->ifFalse, *type);
+    } else {
+      noteAny(&curr->ifTrue);
+      noteAny(&curr->ifFalse);


This doesn't capture that they must be the same type. Is that not important when popping?

They don't need to be the same type in general. In the validator, invalid cases like having children typed i32 and f32 should be caught because the result type would not be the LUB of the children types (unless it is Type::none, in which case it would fail another check that the result should be concrete).

kripken · 2024-04-11T23:29:32Z

src/ir/child-typer.h

+    note(&curr->size, Type::i32);
+  }
+
+  void visitTry(Try* curr) {


Suggested change

void visitTry(Try* curr) {

void visitTry(Try* curr) {

note(&curr->body, curr->type);

kripken · 2024-04-11T23:30:34Z

src/ir/child-typer.h

+
+  void visitCallRef(CallRef* curr, std::optional<HeapType> ht = std::nullopt) {
+    if (!ht) {
+      ht = curr->target->type.getHeapType().getSignature();


This may be unreachable, in which case getHeapType will fail.

I would consider it user error to call this function without providing ht in that case.

I see. Makes sense but please add a comment explaining that expectation so it is obvious to potential future users of the API.

kripken · 2024-04-11T23:33:53Z

src/ir/child-typer.h

+    for (size_t i = 0; i < params.size(); ++i) {
+      note(&curr->operands[i], params[i]);
+    }
+    note(&curr->target, Type(*ht, Nullable));


Technically this can be any subtype of funcref? This enforces the current type IIANM which means it can't help the validator.

Ah, right, this is what is meant by "Subtypings are based only on information that would have been parsed out of a binary." The binary would have included a type annotation, so we report the subtype constraints based on that same type annotation, inferring it from the children if necessary. So the validator will have to validate that it is possible to recover that type before calling into this utility. Same for every instruction that has a type annotation in the binary format but not in the IR.

Ah ok, now that "parsed out of a binary" comment makes sense. I suggest we clarify this with this example at the top of the file.

src/ir/child-typer.h

tlively · 2024-04-12T15:53:14Z

I've fuzzed this for over 200k iterations of TextRoundtrip with the new parser enabled and a custom modification of this PR that always checks that constraints are met rather than checking constraints only after seeing an unreachable.

We previously would eagerly drop all concretely typed expressions on the stack when pushing an unreachable instruction. This was semantically correct and closely modeled the semantics of unreachable instructions, which implicitly drop the entire stack and start a new polymorphic stack. However, it also meant that the structure of the parsed IR did not match the structure of the folded input, which meant that tests involving unreachable children would not parse as intended, preventing the test from testing the intended behavior. For example, this wat: ```wasm (i32.add (i32.const 0) (unreachable) ) ``` Would previously parse into this IR: ```wasm (drop (i32.const 0) ) (i32.add (unreachable) (unreachable) ) ``` To fix this problem, we need to stop eagerly dropping stack values when encountering an unreachable instruction so we can still pop expressions pushed before the unreachable as direct children of later instructions. In the example above, we need to keep the `i32.const 0` on the stack so it is available to be popped and become a child of the `i32.add`. However, the naive solution of simply popping past unreachables would produce invalid IR in some cases. For example, consider this wat: ```wasm f32.const 0 unreachable i32.add ``` The naive solution would parse this wat into this IR: ```wasm (i32.add (f32.const 0) (unreachable) ) ``` But we do not want to parse an `i32.add` with an `f32`-typed child. Neither do we want to reject this input, since it is a perfectly valid Wasm fragment. In this case, we actually want the old behavior of dropping the `f32.const` and replacing it with another `unreachable` as the first child of the `i32.add`. To both match the input structure where possible and also gracefully fall back to the old behavior of dropping expressions prior to the unreachable, collect constraints on the types of each child for each kind of expression and compare them to the types of available expressions on the stack when an unreachable instruction will be popped. When the constraints are satisfied, pop expressions normally, even after popping the unreachable instruction. Otherwise, drop the instructions that precede the unreachable instruction to ensure we parse valid IR. To collect the constraints, add a new `ChildTyper` utility that calls a different callback for each kind of possible type constraint for each child. In the future, this utility can be used to simplify the validator as well.

tlively · 2024-04-16T16:47:18Z

@kripken, PTAL at the last commit: documentation comment

tlively · 2024-04-16T16:48:09Z

This also just passed 1 million RoundtripText iterations 🎉

This was referenced Apr 11, 2024

[Parser] Enable the new text parser by default #6371

Merged

Ensure printed tuple.extract arity is valid #6487

Merged

[Parser] Use unreachables to fulfill tuple requirements #6488

Merged

tlively requested a review from kripken April 11, 2024 02:42

tlively marked this pull request as ready for review April 11, 2024 02:42

tlively force-pushed the parser-pop-unreachable-tuple branch from f64f57d to 59c021e Compare April 11, 2024 18:34

tlively force-pushed the child-typer branch from 781d8d7 to c7acccf Compare April 11, 2024 18:34

kripken reviewed Apr 11, 2024

View reviewed changes

Base automatically changed from parser-pop-unreachable-tuple to main April 11, 2024 19:10

tlively force-pushed the child-typer branch from c7acccf to 3359266 Compare April 11, 2024 23:13

kripken reviewed Apr 11, 2024

View reviewed changes

tlively force-pushed the child-typer branch from 3359266 to 1587839 Compare April 12, 2024 03:42

tlively added 7 commits April 16, 2024 08:40

fixes

f6e64dc

fix tuple glb

6706622

remove confusing comment

36529dc

fix block bug

216cbe1

try bodies

ee3c7b2

documentation comment

37032a2

tlively force-pushed the child-typer branch from 1587839 to 37032a2 Compare April 16, 2024 16:46

kripken approved these changes Apr 16, 2024

View reviewed changes

tlively merged commit 359d5aa into main Apr 16, 2024
13 checks passed

tlively deleted the child-typer branch April 16, 2024 17:43

tlively mentioned this pull request May 20, 2024

[Parser] Issues to resolve before switching to the new wat parser #6208

Closed

18 tasks

gkdn mentioned this pull request Aug 31, 2024

stringconsts gkdn/binaryen#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Parser] Pop past unreachables where possible #6489

[Parser] Pop past unreachables where possible #6489

tlively commented Apr 11, 2024

tlively commented Apr 11, 2024 •

edited

Loading

kripken Apr 11, 2024

tlively Apr 11, 2024

kripken Apr 11, 2024

tlively Apr 12, 2024

kripken Apr 11, 2024

tlively Apr 12, 2024

kripken Apr 15, 2024

kripken Apr 11, 2024

tlively Apr 12, 2024

kripken Apr 11, 2024

kripken Apr 11, 2024

tlively Apr 12, 2024

kripken Apr 15, 2024

kripken Apr 11, 2024

tlively Apr 12, 2024

kripken Apr 15, 2024

tlively commented Apr 12, 2024

tlively commented Apr 16, 2024

tlively commented Apr 16, 2024

	void visitTry(Try* curr) {
	void visitTry(Try* curr) {
	note(&curr->body, curr->type);

[Parser] Pop past unreachables where possible #6489

[Parser] Pop past unreachables where possible #6489

Conversation

tlively commented Apr 11, 2024

tlively commented Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively commented Apr 12, 2024

tlively commented Apr 16, 2024

tlively commented Apr 16, 2024

tlively commented Apr 11, 2024 •

edited

Loading