-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analysis] Add an experimental TypeGeneralizing optimization #6080
Conversation
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see a concrete optimization using the framework!
Index numLocals = func->getNumLocals(); | ||
for (; i < numParams; ++i) { | ||
updateLocal(i, func->getLocalType(i)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused to see the setup of parameters in code that handles the function exit. The function might not have an exit, in particular, I think? If it infinitely loops for example. Then how do its parameters get set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! The parameters should indeed be handled at the entry block instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After fixing this, the next PR needs to be combined into this one to keep the tests passing.
src/passes/TypeGeneralizing.cpp
Outdated
updateLocal(i, type); | ||
} | ||
} | ||
// We similarly cannot change the types of results. Push requirements that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We similarly cannot change the types of results. Push requirements that | |
// We similarly cannot change the types of results. Push requirements so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pushed requirement is that the stack end up with the correct type. I'll make "requirements" singular here since there is only one. (At one point during development this was a loop, so there were multiple requirements.)
src/passes/TypeGeneralizing.cpp
Outdated
} | ||
} | ||
// We similarly cannot change the types of results. Push requirements that | ||
// the stack end up with the correct type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the stack end up with the correct type. | |
// the stack ends up with the correct type. |
src/passes/TypeGeneralizing.cpp
Outdated
void visitTupleExtract(TupleExtract* curr) { WASM_UNREACHABLE("TODO"); } | ||
void visitRefI31(RefI31* curr) { pop(); } | ||
void visitI31Get(I31Get* curr) { | ||
// Do not allow relaxing to nullable if the input is already non-nullable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing this relaxation might result in the engine doing an extra null check, although I guess disallowing this relaxation might prevent us from removing a previous explicit null check in principle. Still, I think even when we could have removed a previous explicit null check, that's not worth making the input here nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense this might help or hurt, but isn't that the same as casts in general? Generalizing can allow removing casts - to non-null or to a heap type - but it can also hurt at runtime in theory. Is there a reason to consider nullability differently than heap types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that eventually there needs to be a null check eventually, whether explicit or implicit, before doing any kind of access operation. For heap type casts, on the other hand, if we can weaken or remove a cast, that doesn't mean the cast is implicitly done later. So removing heap type casts is more useful than removing null checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Please add a comment with that, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I just remembered that engines can do virtual memory tricks for null checks as well, so the implicit null checks can actually be free. That means that removing explicit null checks is more useful than I thought, so I'll simplify this code to allow the full relaxation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but on the other hand virtual memory tricks for i31ref do not work because they are never dereferenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go ahead and simplify this 😅
break; | ||
} | ||
WASM_UNREACHABLE("unexpected type"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a separate PR perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How strongly do you feel about this? It seems like more overhead than it's worth, and we've definitely added type APIs like this as necessary in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly.
@@ -0,0 +1,219 @@ | |||
;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited. | |||
;; RUN: foreach %s %t wasm-opt --dce --experimental-type-generalizing -all -S -o - | filecheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining why DCE, or at least pointing readers to see the comments lower down?
Would it be hard to scan unreachable code btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy to scan unreachable code, but CFGWalker currently does not produce unreachable basic blocks, so there is currently nothing in the CFG to scan. Do you think it's worth adding support to CFGWalker to produce unreachable basic blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, we avoid creating unreachable basic blocks (at least obvious ones). It's not worth changing that I think. However, if this pass only runs properly after DCE, we'd need to run DCE internally inside it to avoid problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that means that any pass that uses the framework will have to internally run DCE first, which seems unfortunate. I'll run DCE inside this pass for now, but maybe it will be worth having opt-in unreachable blocks in the future.
(func $unconstrained | ||
;; This non-ref local should be unmodified | ||
(local $x i32) | ||
;; There is no constraint on the type of this local, so make it top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;; There is no constraint on the type of this local, so make it top. | |
;; There is no constraint on the type of this local, so leave it as top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think the anyref
below is supposed to be something else.
;; CHECK-NEXT: (local.get $var) | ||
;; CHECK-NEXT: ) | ||
(func $implicit-return (result eqref) | ||
;; This will be optimized, but only to eqref because of the constaint from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;; This will be optimized, but only to eqref because of the constaint from the | |
;; This will be optimized, but only to eqref because of the constraint from the |
;; CHECK-NEXT: (unreachable) | ||
;; CHECK-NEXT: ) | ||
(func $implicit-return-unreachable (result eqref) | ||
;; Now will optimize this all the way to anyref because we don't analyze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;; Now will optimize this all the way to anyref because we don't analyze | |
;; We will optimize this all the way to anyref because we don't analyze |
943688d
to
ff6e482
Compare
This new optimization will eventually weaken casts by generalizing (i.e. un-refining) their output types. If a cast is weakened enough that its output type is a supertype of its input type, the cast will be able to be removed by OptimizeInstructions. Unlike refining cast inputs, generalizing cast outputs can break module validation. For example, if the result of a cast is stored to a local and the cast is weakened enough that its output type is no longer a subtype of that local's type, then the local.set after the cast will no longer validate. To avoid this validation failure, this optimization would have to generalize the type of the local as well. In general, the more we can generalize the types of program locations, the more we can weaken casts of values that flow into those locations. This initial implementation only generalizes the types of locals and does not actually weaken casts yet. It serves as a proof of concept for the analysis required to perform the full optimization, though. The analysis uses the new analysis framework to perform a reverse analysis tracking type requirements for each local and reference-typed stack value in a function. Planned and potential future work includes: - Taking updated local constraints into account when determining what blocks may need to be re-analyzed after the current block. - Implementing the transfer function for all kinds of expressions. - Tracking requirements on the dynamic types of each location to generalize allocations as well. - Making the analysis interprocedural and generalizing the types of more program locations. - Optimizing tuple-typed locations. - Generalizing only those locations necessary to eliminate at least one cast (although this would make the anlysis bidirectional, so it is probably better left to separate passes).
Whenever the constraint on a local is updated, any block that does a local.set on that global may need to be re-analyzed. Update the TypeGeneralizing transfer function to include these blocks in the set of dependent blocks it returns. Add a test that depends on this logic to validate.
ff6e482
to
c6e8184
Compare
Sorry for the force pushes. All the comments should now be addressed. PTAL at all the commits after the first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm % one question in a review comment and one larger question I'll ask in a PR comment in a second.
for (size_t i = 0, n = dependentSets.size(); i < n; ++i) { | ||
localDependents[i] = std::vector<const BasicBlock*>( | ||
dependentSets[i].begin(), dependentSets[i].end()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was expecting this kind of locals dependency to be done in the framework. Was that wrong of me to hope for?
That is, this is likely a common pattern so I'm surprised to see it in this pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now passes have to explicitly provide all the dependent blocks, including not only dependencies due to updated locals (and globals in the future), but also the basic predecessor / successor dependencies.
We have the VisitorTransferFunction
that knows to add the predecessors or successors as dependencies for forward or backward analyses, but it doesn't know about locals or any other sources of dependencies.
Another place where you would hope the framework could be more helpful is with the transfer function for individual instructions. For example, visitDrop
will simply do a pop()
in the majority of backwards analysis. However, so far in this analysis, we only push reference types onto the stack, so a drop of an i32 should not do the pop()
. I could imagine having a utility that applied a custom predicate and was able to do most of the pops automatically, but it wouldn't know what to push without user intervention.
I think the correct play is to get experience implementing a bunch of analyses, then figure out how to factor out layered utilities for the common patterns.
This definitely lgtm to experiment with (and land if that's useful), but the more I think about this the less clear to me how much benefit we can get from it. We just need to measure, I guess. But my concern is that I can't figure out any case where this optimization will definitely and unambiguously help. For example, here it can hurt: sub = cast<Sub>(super);
work1(sub);
work2(sub);
function work1(x : Super) {
if (x instanceof Sub) {.. } else { ..}
}
// work2 is the same If we remove that cast, then after runtime inlining the VM may end up testing that type twice. If we keep that cast, then after inlining the VM can simply propagate the type and remove the later tests. That is, a cast can serve two purposes: it can be necessary for validation, or, as in the example here, it serves as a kind of "assertion" or "proof", "you can rely on this being of this type from here on". We don't necessarily want to remove such casts. But the more I think on it, it seems like pretty much any cast that isn't needed for validation may be of that kind. Here is another example, without inlining this time: object.property = cast<Sub>(super);
if (object.property instance of Sub) { .. } else { .. }
.. Say that I was hoping we could think of cases that this definitely only helps, and eventually make the pass focus only on those. Can we think of any such cases? |
Yes, everything you wrote is correct, and it is often helpful to keep a single early cast to save multiple later casts. This pass will help in the case where we do a cast to prove a value has a particular refined type, but then we never actually depend on the value having that refined type. Once this analysis becomes interprocedural and once it tracks requirements due to casts separately from requirements due to validation, we will be able to see that the output of the cast in your example flows into locations that actually do depend on the more refined type, so with some extra work we should be able to avoid removing that cast. Alternatively, if we're expecting the engine to inline and then optimize, it doesn't seem unreasonable to expect the engine to deduplicate the inlined casts. We'll just have to measure and see :) |
In that case yes, but more commonly runtime inlining will be of an indirect call. In such cases we can't expect to statically identify the risk. I fear that is the common case.
Yes, that's fair, other opts might interact here. |
…mbly#6080) This new optimization will eventually weaken casts by generalizing (i.e. un-refining) their output types. If a cast is weakened enough that its output type is a supertype of its input type, the cast will be able to be removed by OptimizeInstructions. Unlike refining cast inputs, generalizing cast outputs can break module validation. For example, if the result of a cast is stored to a local and the cast is weakened enough that its output type is no longer a subtype of that local's type, then the local.set after the cast will no longer validate. To avoid this validation failure, this optimization would have to generalize the type of the local as well. In general, the more we can generalize the types of program locations, the more we can weaken casts of values that flow into those locations. This initial implementation only generalizes the types of locals and does not actually weaken casts yet. It serves as a proof of concept for the analysis required to perform the full optimization, though. The analysis uses the new analysis framework to perform a reverse analysis tracking type requirements for each local and reference-typed stack value in a function. Planned and potential future work includes: - Implementing the transfer function for all kinds of expressions. - Tracking requirements on the dynamic types of each location to generalize allocations as well. - Making the analysis interprocedural and generalizing the types of more program locations. - Optimizing tuple-typed locations. - Generalizing only those locations necessary to eliminate at least one cast (although this would make the anlysis bidirectional, so it is probably better left to separate passes).
This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.
Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.
This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.
Planned and potential future work includes:
allocations as well.
program locations.
(although this would make the anlysis bidirectional, so it is probably better
left to separate passes).