Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analysis] Add an experimental TypeGeneralizing optimization #6080

Merged
merged 9 commits into from
Nov 8, 2023

Conversation

tlively
Copy link
Member

@tlively tlively commented Nov 4, 2023

This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.

Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.

This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.

Planned and potential future work includes:

  • Implementing the transfer function for all kinds of expressions.
  • Tracking requirements on the dynamic types of each location to generalize
    allocations as well.
  • Making the analysis interprocedural and generalizing the types of more
    program locations.
  • Optimizing tuple-typed locations.
  • Generalizing only those locations necessary to eliminate at least one cast
    (although this would make the anlysis bidirectional, so it is probably better
    left to separate passes).

@tlively tlively requested review from ashleynh and kripken November 4, 2023 00:27
@tlively
Copy link
Member Author

tlively commented Nov 4, 2023

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see a concrete optimization using the framework!

src/passes/TypeGeneralizing.cpp Show resolved Hide resolved
Index numLocals = func->getNumLocals();
for (; i < numParams; ++i) {
updateLocal(i, func->getLocalType(i));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused to see the setup of parameters in code that handles the function exit. The function might not have an exit, in particular, I think? If it infinitely loops for example. Then how do its parameters get set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! The parameters should indeed be handled at the entry block instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing this, the next PR needs to be combined into this one to keep the tests passing.

updateLocal(i, type);
}
}
// We similarly cannot change the types of results. Push requirements that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We similarly cannot change the types of results. Push requirements that
// We similarly cannot change the types of results. Push requirements so that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pushed requirement is that the stack end up with the correct type. I'll make "requirements" singular here since there is only one. (At one point during development this was a loop, so there were multiple requirements.)

}
}
// We similarly cannot change the types of results. Push requirements that
// the stack end up with the correct type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the stack end up with the correct type.
// the stack ends up with the correct type.

void visitTupleExtract(TupleExtract* curr) { WASM_UNREACHABLE("TODO"); }
void visitRefI31(RefI31* curr) { pop(); }
void visitI31Get(I31Get* curr) {
// Do not allow relaxing to nullable if the input is already non-nullable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing this relaxation might result in the engine doing an extra null check, although I guess disallowing this relaxation might prevent us from removing a previous explicit null check in principle. Still, I think even when we could have removed a previous explicit null check, that's not worth making the input here nullable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense this might help or hurt, but isn't that the same as casts in general? Generalizing can allow removing casts - to non-null or to a heap type - but it can also hurt at runtime in theory. Is there a reason to consider nullability differently than heap types here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that eventually there needs to be a null check eventually, whether explicit or implicit, before doing any kind of access operation. For heap type casts, on the other hand, if we can weaken or remove a cast, that doesn't mean the cast is implicitly done later. So removing heap type casts is more useful than removing null checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Please add a comment with that, then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just remembered that engines can do virtual memory tricks for null checks as well, so the implicit null checks can actually be free. That means that removing explicit null checks is more useful than I thought, so I'll simplify this code to allow the full relaxation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but on the other hand virtual memory tricks for i31ref do not work because they are never dereferenced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go ahead and simplify this 😅

break;
}
WASM_UNREACHABLE("unexpected type");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a separate PR perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How strongly do you feel about this? It seems like more overhead than it's worth, and we've definitely added type APIs like this as necessary in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly.

@@ -0,0 +1,219 @@
;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited.
;; RUN: foreach %s %t wasm-opt --dce --experimental-type-generalizing -all -S -o - | filecheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment explaining why DCE, or at least pointing readers to see the comments lower down?

Would it be hard to scan unreachable code btw?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to scan unreachable code, but CFGWalker currently does not produce unreachable basic blocks, so there is currently nothing in the CFG to scan. Do you think it's worth adding support to CFGWalker to produce unreachable basic blocks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, we avoid creating unreachable basic blocks (at least obvious ones). It's not worth changing that I think. However, if this pass only runs properly after DCE, we'd need to run DCE internally inside it to avoid problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that means that any pass that uses the framework will have to internally run DCE first, which seems unfortunate. I'll run DCE inside this pass for now, but maybe it will be worth having opt-in unreachable blocks in the future.

(func $unconstrained
;; This non-ref local should be unmodified
(local $x i32)
;; There is no constraint on the type of this local, so make it top.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;; There is no constraint on the type of this local, so make it top.
;; There is no constraint on the type of this local, so leave it as top.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think the anyref below is supposed to be something else.

;; CHECK-NEXT: (local.get $var)
;; CHECK-NEXT: )
(func $implicit-return (result eqref)
;; This will be optimized, but only to eqref because of the constaint from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;; This will be optimized, but only to eqref because of the constaint from the
;; This will be optimized, but only to eqref because of the constraint from the

;; CHECK-NEXT: (unreachable)
;; CHECK-NEXT: )
(func $implicit-return-unreachable (result eqref)
;; Now will optimize this all the way to anyref because we don't analyze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;; Now will optimize this all the way to anyref because we don't analyze
;; We will optimize this all the way to anyref because we don't analyze

Base automatically changed from cfg-exit-blocks to main November 6, 2023 23:43
This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.

Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.

This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.

Planned and potential future work includes:

 - Taking updated local constraints into account when determining what blocks
   may need to be re-analyzed after the current block.
 - Implementing the transfer function for all kinds of expressions.
 - Tracking requirements on the dynamic types of each location to generalize
   allocations as well.
 - Making the analysis interprocedural and generalizing the types of more
   program locations.
 - Optimizing tuple-typed locations.
 - Generalizing only those locations necessary to eliminate at least one cast
   (although this would make the anlysis bidirectional, so it is probably better
   left to separate passes).
Whenever the constraint on a local is updated, any block that does a local.set
on that global may need to be re-analyzed. Update the TypeGeneralizing transfer
function to include these blocks in the set of dependent blocks it returns. Add
a test that depends on this logic to validate.
@tlively
Copy link
Member Author

tlively commented Nov 7, 2023

Sorry for the force pushes. All the comments should now be addressed. PTAL at all the commits after the first.

@tlively tlively requested a review from kripken November 8, 2023 17:34
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm % one question in a review comment and one larger question I'll ask in a PR comment in a second.

for (size_t i = 0, n = dependentSets.size(); i < n; ++i) {
localDependents[i] = std::vector<const BasicBlock*>(
dependentSets[i].begin(), dependentSets[i].end());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was expecting this kind of locals dependency to be done in the framework. Was that wrong of me to hope for?

That is, this is likely a common pattern so I'm surprised to see it in this pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now passes have to explicitly provide all the dependent blocks, including not only dependencies due to updated locals (and globals in the future), but also the basic predecessor / successor dependencies.

We have the VisitorTransferFunction that knows to add the predecessors or successors as dependencies for forward or backward analyses, but it doesn't know about locals or any other sources of dependencies.

Another place where you would hope the framework could be more helpful is with the transfer function for individual instructions. For example, visitDrop will simply do a pop() in the majority of backwards analysis. However, so far in this analysis, we only push reference types onto the stack, so a drop of an i32 should not do the pop(). I could imagine having a utility that applied a custom predicate and was able to do most of the pops automatically, but it wouldn't know what to push without user intervention.

I think the correct play is to get experience implementing a bunch of analyses, then figure out how to factor out layered utilities for the common patterns.

@kripken
Copy link
Member

kripken commented Nov 8, 2023

This definitely lgtm to experiment with (and land if that's useful), but the more I think about this the less clear to me how much benefit we can get from it. We just need to measure, I guess. But my concern is that I can't figure out any case where this optimization will definitely and unambiguously help. For example, here it can hurt:

sub = cast<Sub>(super);
work1(sub);
work2(sub);

function work1(x : Super) {
  if (x instanceof Sub) {.. } else { ..}
}
// work2 is the same

If we remove that cast, then after runtime inlining the VM may end up testing that type twice. If we keep that cast, then after inlining the VM can simply propagate the type and remove the later tests.

That is, a cast can serve two purposes: it can be necessary for validation, or, as in the example here, it serves as a kind of "assertion" or "proof", "you can rely on this being of this type from here on". We don't necessarily want to remove such casts. But the more I think on it, it seems like pretty much any cast that isn't needed for validation may be of that kind.

Here is another example, without inlining this time:

object.property = cast<Sub>(super);
if (object.property instance of Sub) { .. } else { .. }
..

Say that object.property has type Super. Then if we keep the cast the VM can do load-store forwarding and see a more refined type than object.property has, and use that.

I was hoping we could think of cases that this definitely only helps, and eventually make the pass focus only on those. Can we think of any such cases?

@tlively
Copy link
Member Author

tlively commented Nov 8, 2023

Yes, everything you wrote is correct, and it is often helpful to keep a single early cast to save multiple later casts. This pass will help in the case where we do a cast to prove a value has a particular refined type, but then we never actually depend on the value having that refined type.

Once this analysis becomes interprocedural and once it tracks requirements due to casts separately from requirements due to validation, we will be able to see that the output of the cast in your example flows into locations that actually do depend on the more refined type, so with some extra work we should be able to avoid removing that cast.

Alternatively, if we're expecting the engine to inline and then optimize, it doesn't seem unreasonable to expect the engine to deduplicate the inlined casts. We'll just have to measure and see :)

@kripken
Copy link
Member

kripken commented Nov 8, 2023

Once this analysis becomes interprocedural and once it tracks requirements due to casts separately from requirements due to validation, we will be able to see that the output of the cast in your example flows into locations that actually do depend on the more refined type, so with some extra work we should be able to avoid removing that cast.

In that case yes, but more commonly runtime inlining will be of an indirect call. In such cases we can't expect to statically identify the risk. I fear that is the common case.

Alternatively, if we're expecting the engine to inline and then optimize, it doesn't seem unreasonable to expect the engine to deduplicate the inlined casts.

Yes, that's fair, other opts might interact here.

@tlively tlively merged commit d6df91b into main Nov 8, 2023
14 checks passed
@tlively tlively deleted the type-generalizing branch November 8, 2023 21:20
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
…mbly#6080)

This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.

Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.

This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.

Planned and potential future work includes:

 - Implementing the transfer function for all kinds of expressions.
 - Tracking requirements on the dynamic types of each location to generalize
   allocations as well.
 - Making the analysis interprocedural and generalizing the types of more
   program locations.
 - Optimizing tuple-typed locations.
 - Generalizing only those locations necessary to eliminate at least one cast
   (although this would make the anlysis bidirectional, so it is probably better
   left to separate passes).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants