[analysis] Add an experimental TypeGeneralizing optimization #6080

tlively · 2023-11-04T00:27:34Z

This new optimization will eventually weaken casts by generalizing (i.e.
un-refining) their output types. If a cast is weakened enough that its output
type is a supertype of its input type, the cast will be able to be removed by
OptimizeInstructions.

Unlike refining cast inputs, generalizing cast outputs can break module
validation. For example, if the result of a cast is stored to a local and the
cast is weakened enough that its output type is no longer a subtype of that
local's type, then the local.set after the cast will no longer validate. To
avoid this validation failure, this optimization would have to generalize the
type of the local as well. In general, the more we can generalize the types of
program locations, the more we can weaken casts of values that flow into those
locations.

This initial implementation only generalizes the types of locals and does not
actually weaken casts yet. It serves as a proof of concept for the analysis
required to perform the full optimization, though. The analysis uses the new
analysis framework to perform a reverse analysis tracking type requirements for
each local and reference-typed stack value in a function.

Planned and potential future work includes:

Implementing the transfer function for all kinds of expressions.
Tracking requirements on the dynamic types of each location to generalize
allocations as well.
Making the analysis interprocedural and generalizing the types of more
program locations.
Optimizing tuple-typed locations.
Generalizing only those locations necessary to eliminate at least one cast
(although this would make the anlysis bidirectional, so it is probably better
left to separate passes).

tlively · 2023-11-04T00:27:47Z

Current dependencies on/for this PR:

main
- PR Update CFGWalker to generate consolidated exit blocks #6079
  - PR [analysis] Add an experimental TypeGeneralizing optimization #6080 👈
    - PR [TypeGeneralizing] Properly re-analyze blocks when locals are updated #6081

This stack of pull requests is managed by Graphite.

kripken

Nice to see a concrete optimization using the framework!

src/passes/TypeGeneralizing.cpp

kripken · 2023-11-06T17:08:49Z

src/passes/TypeGeneralizing.cpp

+    Index numLocals = func->getNumLocals();
+    for (; i < numParams; ++i) {
+      updateLocal(i, func->getLocalType(i));
+    }


I'm a little confused to see the setup of parameters in code that handles the function exit. The function might not have an exit, in particular, I think? If it infinitely loops for example. Then how do its parameters get set?

Nice catch! The parameters should indeed be handled at the entry block instead.

After fixing this, the next PR needs to be combined into this one to keep the tests passing.

kripken · 2023-11-06T17:10:05Z

src/passes/TypeGeneralizing.cpp

+        updateLocal(i, type);
+      }
+    }
+    // We similarly cannot change the types of results. Push requirements that


Suggested change

// We similarly cannot change the types of results. Push requirements that

// We similarly cannot change the types of results. Push requirements so that

The pushed requirement is that the stack end up with the correct type. I'll make "requirements" singular here since there is only one. (At one point during development this was a loop, so there were multiple requirements.)

kripken · 2023-11-06T17:10:09Z

src/passes/TypeGeneralizing.cpp

+      }
+    }
+    // We similarly cannot change the types of results. Push requirements that
+    // the stack end up with the correct type.


Suggested change

// the stack end up with the correct type.

// the stack ends up with the correct type.

kripken · 2023-11-06T17:11:46Z

src/passes/TypeGeneralizing.cpp

+  void visitTupleExtract(TupleExtract* curr) { WASM_UNREACHABLE("TODO"); }
+  void visitRefI31(RefI31* curr) { pop(); }
+  void visitI31Get(I31Get* curr) {
+    // Do not allow relaxing to nullable if the input is already non-nullable.


Allowing this relaxation might result in the engine doing an extra null check, although I guess disallowing this relaxation might prevent us from removing a previous explicit null check in principle. Still, I think even when we could have removed a previous explicit null check, that's not worth making the input here nullable.

Makes sense this might help or hurt, but isn't that the same as casts in general? Generalizing can allow removing casts - to non-null or to a heap type - but it can also hurt at runtime in theory. Is there a reason to consider nullability differently than heap types here?

The difference is that eventually there needs to be a null check eventually, whether explicit or implicit, before doing any kind of access operation. For heap type casts, on the other hand, if we can weaken or remove a cast, that doesn't mean the cast is implicitly done later. So removing heap type casts is more useful than removing null checks.

Fair enough. Please add a comment with that, then.

Oh, I just remembered that engines can do virtual memory tricks for null checks as well, so the implicit null checks can actually be free. That means that removing explicit null checks is more useful than I thought, so I'll simplify this code to allow the full relaxation.

Oh, but on the other hand virtual memory tricks for i31ref do not work because they are never dereferenced.

I decided to go ahead and simplify this 😅

kripken · 2023-11-06T17:18:31Z

src/wasm/wasm-type.cpp

+      break;
+  }
+  WASM_UNREACHABLE("unexpected type");
+}


This could be a separate PR perhaps?

How strongly do you feel about this? It seems like more overhead than it's worth, and we've definitely added type APIs like this as necessary in the past.

I don't feel strongly.

kripken · 2023-11-06T18:28:54Z

test/lit/passes/type-generalizing.wast

@@ -0,0 +1,219 @@
+;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited.
+;; RUN: foreach %s %t wasm-opt --dce --experimental-type-generalizing -all -S -o - | filecheck %s


Maybe add a comment explaining why DCE, or at least pointing readers to see the comments lower down?

Would it be hard to scan unreachable code btw?

I would be happy to scan unreachable code, but CFGWalker currently does not produce unreachable basic blocks, so there is currently nothing in the CFG to scan. Do you think it's worth adding support to CFGWalker to produce unreachable basic blocks?

Ah, yeah, we avoid creating unreachable basic blocks (at least obvious ones). It's not worth changing that I think. However, if this pass only runs properly after DCE, we'd need to run DCE internally inside it to avoid problems.

Hmm, that means that any pass that uses the framework will have to internally run DCE first, which seems unfortunate. I'll run DCE inside this pass for now, but maybe it will be worth having opt-in unreachable blocks in the future.

kripken · 2023-11-06T19:29:23Z

test/lit/passes/type-generalizing.wast

+ (func $unconstrained
+  ;; This non-ref local should be unmodified
+  (local $x i32)
+  ;; There is no constraint on the type of this local, so make it top.


Suggested change

;; There is no constraint on the type of this local, so make it top.

;; There is no constraint on the type of this local, so leave it as top.

Oh, I think the anyref below is supposed to be something else.

kripken · 2023-11-06T19:29:48Z

test/lit/passes/type-generalizing.wast

+ ;; CHECK-NEXT:  (local.get $var)
+ ;; CHECK-NEXT: )
+ (func $implicit-return (result eqref)
+  ;; This will be optimized, but only to eqref because of the constaint from the


Suggested change

;; This will be optimized, but only to eqref because of the constaint from the

;; This will be optimized, but only to eqref because of the constraint from the

kripken · 2023-11-06T19:30:34Z

test/lit/passes/type-generalizing.wast

+ ;; CHECK-NEXT:  (unreachable)
+ ;; CHECK-NEXT: )
+ (func $implicit-return-unreachable (result eqref)
+  ;; Now will optimize this all the way to anyref because we don't analyze


Suggested change

;; Now will optimize this all the way to anyref because we don't analyze

;; We will optimize this all the way to anyref because we don't analyze

This new optimization will eventually weaken casts by generalizing (i.e. un-refining) their output types. If a cast is weakened enough that its output type is a supertype of its input type, the cast will be able to be removed by OptimizeInstructions. Unlike refining cast inputs, generalizing cast outputs can break module validation. For example, if the result of a cast is stored to a local and the cast is weakened enough that its output type is no longer a subtype of that local's type, then the local.set after the cast will no longer validate. To avoid this validation failure, this optimization would have to generalize the type of the local as well. In general, the more we can generalize the types of program locations, the more we can weaken casts of values that flow into those locations. This initial implementation only generalizes the types of locals and does not actually weaken casts yet. It serves as a proof of concept for the analysis required to perform the full optimization, though. The analysis uses the new analysis framework to perform a reverse analysis tracking type requirements for each local and reference-typed stack value in a function. Planned and potential future work includes: - Taking updated local constraints into account when determining what blocks may need to be re-analyzed after the current block. - Implementing the transfer function for all kinds of expressions. - Tracking requirements on the dynamic types of each location to generalize allocations as well. - Making the analysis interprocedural and generalizing the types of more program locations. - Optimizing tuple-typed locations. - Generalizing only those locations necessary to eliminate at least one cast (although this would make the anlysis bidirectional, so it is probably better left to separate passes).

Whenever the constraint on a local is updated, any block that does a local.set on that global may need to be re-analyzed. Update the TypeGeneralizing transfer function to include these blocks in the set of dependent blocks it returns. Add a test that depends on this logic to validate.

tlively · 2023-11-07T01:51:58Z

Sorry for the force pushes. All the comments should now be addressed. PTAL at all the commits after the first.

kripken

Overall lgtm % one question in a review comment and one larger question I'll ask in a PR comment in a second.

kripken · 2023-11-08T17:37:17Z

src/passes/TypeGeneralizing.cpp

+    for (size_t i = 0, n = dependentSets.size(); i < n; ++i) {
+      localDependents[i] = std::vector<const BasicBlock*>(
+        dependentSets[i].begin(), dependentSets[i].end());
+    }


Hmm, I was expecting this kind of locals dependency to be done in the framework. Was that wrong of me to hope for?

That is, this is likely a common pattern so I'm surprised to see it in this pass.

Right now passes have to explicitly provide all the dependent blocks, including not only dependencies due to updated locals (and globals in the future), but also the basic predecessor / successor dependencies.

We have the VisitorTransferFunction that knows to add the predecessors or successors as dependencies for forward or backward analyses, but it doesn't know about locals or any other sources of dependencies.

Another place where you would hope the framework could be more helpful is with the transfer function for individual instructions. For example, visitDrop will simply do a pop() in the majority of backwards analysis. However, so far in this analysis, we only push reference types onto the stack, so a drop of an i32 should not do the pop(). I could imagine having a utility that applied a custom predicate and was able to do most of the pops automatically, but it wouldn't know what to push without user intervention.

I think the correct play is to get experience implementing a bunch of analyses, then figure out how to factor out layered utilities for the common patterns.

kripken · 2023-11-08T17:53:42Z

This definitely lgtm to experiment with (and land if that's useful), but the more I think about this the less clear to me how much benefit we can get from it. We just need to measure, I guess. But my concern is that I can't figure out any case where this optimization will definitely and unambiguously help. For example, here it can hurt:

sub = cast<Sub>(super);
work1(sub);
work2(sub);

function work1(x : Super) {
  if (x instanceof Sub) {.. } else { ..}
}
// work2 is the same

If we remove that cast, then after runtime inlining the VM may end up testing that type twice. If we keep that cast, then after inlining the VM can simply propagate the type and remove the later tests.

That is, a cast can serve two purposes: it can be necessary for validation, or, as in the example here, it serves as a kind of "assertion" or "proof", "you can rely on this being of this type from here on". We don't necessarily want to remove such casts. But the more I think on it, it seems like pretty much any cast that isn't needed for validation may be of that kind.

Here is another example, without inlining this time:

object.property = cast<Sub>(super);
if (object.property instance of Sub) { .. } else { .. }
..

Say that object.property has type Super. Then if we keep the cast the VM can do load-store forwarding and see a more refined type than object.property has, and use that.

I was hoping we could think of cases that this definitely only helps, and eventually make the pass focus only on those. Can we think of any such cases?

tlively · 2023-11-08T19:27:55Z

Yes, everything you wrote is correct, and it is often helpful to keep a single early cast to save multiple later casts. This pass will help in the case where we do a cast to prove a value has a particular refined type, but then we never actually depend on the value having that refined type.

Once this analysis becomes interprocedural and once it tracks requirements due to casts separately from requirements due to validation, we will be able to see that the output of the cast in your example flows into locations that actually do depend on the more refined type, so with some extra work we should be able to avoid removing that cast.

Alternatively, if we're expecting the engine to inline and then optimize, it doesn't seem unreasonable to expect the engine to deduplicate the inlined casts. We'll just have to measure and see :)

kripken · 2023-11-08T20:18:31Z

Once this analysis becomes interprocedural and once it tracks requirements due to casts separately from requirements due to validation, we will be able to see that the output of the cast in your example flows into locations that actually do depend on the more refined type, so with some extra work we should be able to avoid removing that cast.

In that case yes, but more commonly runtime inlining will be of an indirect call. In such cases we can't expect to statically identify the risk. I fear that is the common case.

Alternatively, if we're expecting the engine to inline and then optimize, it doesn't seem unreasonable to expect the engine to deduplicate the inlined casts.

Yes, that's fair, other opts might interact here.

…mbly#6080) This new optimization will eventually weaken casts by generalizing (i.e. un-refining) their output types. If a cast is weakened enough that its output type is a supertype of its input type, the cast will be able to be removed by OptimizeInstructions. Unlike refining cast inputs, generalizing cast outputs can break module validation. For example, if the result of a cast is stored to a local and the cast is weakened enough that its output type is no longer a subtype of that local's type, then the local.set after the cast will no longer validate. To avoid this validation failure, this optimization would have to generalize the type of the local as well. In general, the more we can generalize the types of program locations, the more we can weaken casts of values that flow into those locations. This initial implementation only generalizes the types of locals and does not actually weaken casts yet. It serves as a proof of concept for the analysis required to perform the full optimization, though. The analysis uses the new analysis framework to perform a reverse analysis tracking type requirements for each local and reference-typed stack value in a function. Planned and potential future work includes: - Implementing the transfer function for all kinds of expressions. - Tracking requirements on the dynamic types of each location to generalize allocations as well. - Making the analysis interprocedural and generalizing the types of more program locations. - Optimizing tuple-typed locations. - Generalizing only those locations necessary to eliminate at least one cast (although this would make the anlysis bidirectional, so it is probably better left to separate passes).

tlively requested review from ashleynh and kripken November 4, 2023 00:27

This was referenced Nov 4, 2023

Update CFGWalker to generate consolidated exit blocks #6079

Merged

[TypeGeneralizing] Properly re-analyze blocks when locals are updated #6081

Closed

kripken reviewed Nov 6, 2023

View reviewed changes

tlively force-pushed the type-generalizing branch from 943688d to ff6e482 Compare November 6, 2023 22:07

Base automatically changed from cfg-exit-blocks to main November 6, 2023 23:43

tlively added 7 commits November 6, 2023 16:17

add test with nontrivial control flow

b92aa0a

handle locals at function entry instead of exit

19f73aa

fix unreachable tests

c7e5e88

address comments

bf0dc7d

fix lint

c6e8184

tlively force-pushed the type-generalizing branch from ff6e482 to c6e8184 Compare November 7, 2023 01:50

tlively added 2 commits November 7, 2023 11:00

run DCE internally

714e353

simplify i31.get

3e286a0

tlively requested a review from kripken November 8, 2023 17:34

kripken reviewed Nov 8, 2023

View reviewed changes

kripken approved these changes Nov 8, 2023

View reviewed changes

tlively merged commit d6df91b into main Nov 8, 2023
14 checks passed

tlively deleted the type-generalizing branch November 8, 2023 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[analysis] Add an experimental TypeGeneralizing optimization #6080

[analysis] Add an experimental TypeGeneralizing optimization #6080

tlively commented Nov 4, 2023 •

edited

Loading

tlively commented Nov 4, 2023 •

edited

Loading

kripken left a comment

kripken Nov 6, 2023

tlively Nov 6, 2023

tlively Nov 7, 2023

kripken Nov 6, 2023

tlively Nov 6, 2023

kripken Nov 6, 2023

kripken Nov 6, 2023

tlively Nov 6, 2023

kripken Nov 7, 2023

tlively Nov 7, 2023

kripken Nov 7, 2023

tlively Nov 8, 2023

tlively Nov 8, 2023

tlively Nov 8, 2023

kripken Nov 6, 2023

tlively Nov 7, 2023

kripken Nov 7, 2023

kripken Nov 6, 2023

tlively Nov 6, 2023

kripken Nov 7, 2023

tlively Nov 7, 2023

kripken Nov 6, 2023

tlively Nov 6, 2023

kripken Nov 6, 2023

kripken Nov 6, 2023

tlively commented Nov 7, 2023

kripken left a comment

kripken Nov 8, 2023

tlively Nov 8, 2023

kripken commented Nov 8, 2023

tlively commented Nov 8, 2023

kripken commented Nov 8, 2023

	// We similarly cannot change the types of results. Push requirements that
	// We similarly cannot change the types of results. Push requirements so that

	// the stack end up with the correct type.
	// the stack ends up with the correct type.

		@@ -0,0 +1,219 @@
		;; NOTE: Assertions have been generated by update_lit_checks.py --all-items and should not be edited.
		;; RUN: foreach %s %t wasm-opt --dce --experimental-type-generalizing -all -S -o - \| filecheck %s

	;; There is no constraint on the type of this local, so make it top.
	;; There is no constraint on the type of this local, so leave it as top.

	;; This will be optimized, but only to eqref because of the constaint from the
	;; This will be optimized, but only to eqref because of the constraint from the

	;; Now will optimize this all the way to anyref because we don't analyze
	;; We will optimize this all the way to anyref because we don't analyze

[analysis] Add an experimental TypeGeneralizing optimization #6080

[analysis] Add an experimental TypeGeneralizing optimization #6080

Conversation

tlively commented Nov 4, 2023 • edited Loading

tlively commented Nov 4, 2023 • edited Loading

kripken left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively commented Nov 7, 2023

kripken left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken commented Nov 8, 2023

tlively commented Nov 8, 2023

kripken commented Nov 8, 2023

tlively commented Nov 4, 2023 •

edited

Loading

tlively commented Nov 4, 2023 •

edited

Loading