-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parser] Pop past unreachables where possible #6489
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
f64f57d
to
59c021e
Compare
src/ir/child-typer.h
Outdated
|
||
// For each child, call `noteSubtype` with a pointer to the child and the most | ||
// specific type that the child must have. Subtypings are based only on | ||
// information that would have been parsed out of a binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the last sentence here mean? Maybe an example of a subtyping that is somehow not from a binary could help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember what a counterexample would be, so I've just gone ahead and removed that sentence.
|
||
void noteAnyReference(Expression** childp) { | ||
self().noteAnyReferenceType(childp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these foo() { self.foo() }
stubs needed? In CRTP the right thing should just get called anyhow I think - we don't do this elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they're not needed, but they are convenient for avoiding a bunch of calls to self()
everywhere and to provide a shorter internal name.
|
||
void visitBreak(Break* curr, std::optional<Type> labelType = std::nullopt) { | ||
if (!labelType) { | ||
labelType = getLabelType(curr->name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not always do this to get the label's type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes it is more convenient to pass a type explicitly to avoid having to maintain a persistent mapping from labels to types that the callback can use.
For the IRBuilder case, getLabelType
is not readily implementable, arguably due to a layering issue. IRBuilder uses indices (like in the binary format) rather than names as the source of truth for labels. It can convert label names into label indices, but only for label names that are explicitly provided at the start of a scope, not for label names that are generated on demand because a branch targets an unnamed label. That means that an implementation of this getLabelType
callback would fail to find the type of labels with generated names.
Fixing this would not be as simple as letting IRBuilder translated generated label names into indices because the parser currently uses IRBuilder's name-to-index translation to check whether a name is valid. If the translation worked for generated names, the parser would start admitting modules that happen to use the generated names without them being bound. This is arguably a layering violation if you assume the parser should be responsible for tracking its own mapping of label names to indices, but on the other hand the entire point of IRBuilder is to track state so the parser doesn't have to.
The current solution is to have IRBuilder always pass an explicit type rather than depending on the getLabelType
callback. Another solution would be to differentiate generated names from explicit names so that we could have different behavior in the public and private mapping of label names to indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. Yeah, this seems tricky to fix.
I'd suggest adding a comment summarizing some of what you said here, for the benefit of future readers of the code here (though hopefully we'll find a nice refactoring before long).
note(&curr->ifFalse, *type); | ||
} else { | ||
noteAny(&curr->ifTrue); | ||
noteAny(&curr->ifFalse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't capture that they must be the same type. Is that not important when popping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't need to be the same type in general. In the validator, invalid cases like having children typed i32 and f32 should be caught because the result type would not be the LUB of the children types (unless it is Type::none
, in which case it would fail another check that the result should be concrete).
note(&curr->size, Type::i32); | ||
} | ||
|
||
void visitTry(Try* curr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void visitTry(Try* curr) { | |
void visitTry(Try* curr) { | |
note(&curr->body, curr->type); |
|
||
void visitCallRef(CallRef* curr, std::optional<HeapType> ht = std::nullopt) { | ||
if (!ht) { | ||
ht = curr->target->type.getHeapType().getSignature(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unreachable, in which case getHeapType
will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider it user error to call this function without providing ht
in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Makes sense but please add a comment explaining that expectation so it is obvious to potential future users of the API.
for (size_t i = 0; i < params.size(); ++i) { | ||
note(&curr->operands[i], params[i]); | ||
} | ||
note(&curr->target, Type(*ht, Nullable)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this can be any subtype of funcref? This enforces the current type IIANM which means it can't help the validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, this is what is meant by "Subtypings are based only on information that would have been parsed out of a binary." The binary would have included a type annotation, so we report the subtype constraints based on that same type annotation, inferring it from the children if necessary. So the validator will have to validate that it is possible to recover that type before calling into this utility. Same for every instruction that has a type annotation in the binary format but not in the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, now that "parsed out of a binary" comment makes sense. I suggest we clarify this with this example at the top of the file.
I've fuzzed this for over 200k iterations of TextRoundtrip with the new parser enabled and a custom modification of this PR that always checks that constraints are met rather than checking constraints only after seeing an unreachable. |
We previously would eagerly drop all concretely typed expressions on the stack when pushing an unreachable instruction. This was semantically correct and closely modeled the semantics of unreachable instructions, which implicitly drop the entire stack and start a new polymorphic stack. However, it also meant that the structure of the parsed IR did not match the structure of the folded input, which meant that tests involving unreachable children would not parse as intended, preventing the test from testing the intended behavior. For example, this wat: ```wasm (i32.add (i32.const 0) (unreachable) ) ``` Would previously parse into this IR: ```wasm (drop (i32.const 0) ) (i32.add (unreachable) (unreachable) ) ``` To fix this problem, we need to stop eagerly dropping stack values when encountering an unreachable instruction so we can still pop expressions pushed before the unreachable as direct children of later instructions. In the example above, we need to keep the `i32.const 0` on the stack so it is available to be popped and become a child of the `i32.add`. However, the naive solution of simply popping past unreachables would produce invalid IR in some cases. For example, consider this wat: ```wasm f32.const 0 unreachable i32.add ``` The naive solution would parse this wat into this IR: ```wasm (i32.add (f32.const 0) (unreachable) ) ``` But we do not want to parse an `i32.add` with an `f32`-typed child. Neither do we want to reject this input, since it is a perfectly valid Wasm fragment. In this case, we actually want the old behavior of dropping the `f32.const` and replacing it with another `unreachable` as the first child of the `i32.add`. To both match the input structure where possible and also gracefully fall back to the old behavior of dropping expressions prior to the unreachable, collect constraints on the types of each child for each kind of expression and compare them to the types of available expressions on the stack when an unreachable instruction will be popped. When the constraints are satisfied, pop expressions normally, even after popping the unreachable instruction. Otherwise, drop the instructions that precede the unreachable instruction to ensure we parse valid IR. To collect the constraints, add a new `ChildTyper` utility that calls a different callback for each kind of possible type constraint for each child. In the future, this utility can be used to simplify the validator as well.
@kripken, PTAL at the last commit: documentation comment |
This also just passed 1 million RoundtripText iterations 🎉 |
We previously would eagerly drop all concretely typed expressions on the stack
when pushing an unreachable instruction. This was semantically correct and
closely modeled the semantics of unreachable instructions, which implicitly drop
the entire stack and start a new polymorphic stack. However, it also meant that
the structure of the parsed IR did not match the structure of the folded input,
which meant that tests involving unreachable children would not parse as
intended, preventing the test from testing the intended behavior.
For example, this wat:
Would previously parse into this IR:
To fix this problem, we need to stop eagerly dropping stack values when
encountering an unreachable instruction so we can still pop expressions pushed
before the unreachable as direct children of later instructions. In the example
above, we need to keep the
i32.const 0
on the stack so it is available to bepopped and become a child of the
i32.add
.However, the naive solution of simply popping past unreachables would produce
invalid IR in some cases. For example, consider this wat:
The naive solution would parse this wat into this IR:
But we do not want to parse an
i32.add
with anf32
-typed child. Neither dowe want to reject this input, since it is a perfectly valid Wasm fragment. In this
case, we actually want the old behavior of dropping the
f32.const
andreplacing it with another
unreachable
as the first child of thei32.add
.To both match the input structure where possible and also gracefully fall back
to the old behavior of dropping expressions prior to the unreachable, collect
constraints on the types of each child for each kind of expression and compare
them to the types of available expressions on the stack when an unreachable
instruction will be popped. When the constraints are satisfied, pop expressions
normally, even after popping the unreachable instruction. Otherwise, drop the
instructions that precede the unreachable instruction to ensure we parse valid
IR.
To collect the constraints, add a new
ChildTyper
utility that calls adifferent callback for each kind of possible type constraint for each child. In
the future, this utility can be used to simplify the validator as well.