Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC] Move InstrumentedPass logic out and use it in another place #6132

Merged
merged 15 commits into from
Nov 28, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented Nov 23, 2023

Asyncify gained a way to wrap a pass so that it only runs on a given set of
functions, rather than on all functions, so the wrapper "filters" what the pass
operates on. That was useful in Asyncify as we wanted to only do work on
functions that Asyncify actually instrumented.

There is another place in the code that needs such functionality,
optimizeAfterInlining, which runs optimizations after we inline; again, we
only want to optimize on the functions we know are relevant because they
changed. To do that, move that logic out to a general place so it can be
reused. This makes the code there a lot less hackish.

While doing so make the logic only work on function-parallel passes. It
never did anyhow, but now it asserts on that. (It can't run on a general
pass because a general one does not provide an interface to affect which
functions it operates on; a general pass is entirely opaque in that way.)

}

void runOnFunction(Module* module, Function* func) override {
if (!relevantFuncs.count(func)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To verify this PR is correct (on top of the test suite having no changes), that is, that it actually runs only on the filtered functions, I checked what happens when this condition is flipped so that we operate on the inverse set of the functions we should. As expected the test suite then gets some huge diffs in asyncify and inlining tests.

@kripken kripken requested a review from tlively November 27, 2023 22:27
}
PassRunner runner(module, parentRunner->options);
PassUtils::FilteredPassRunner runner(module, funcs);
runner.options = parentRunner->options;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems nicer to pass the options in to the pass runner constructor rather than imperatively setting them afterward. Can we add the options to the FilteredPassRunner constructor?

return std::make_unique<FilteredPass>(pass->create(), relevantFuncs);
}

FilteredPass(std::unique_ptr<Pass> pass, const FuncSet& relevantFuncs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an rvalue reference so you don't have to move-construct the parameter just to move it again.

Suggested change
FilteredPass(std::unique_ptr<Pass> pass, const FuncSet& relevantFuncs)
FilteredPass(std::unique_ptr<Pass>&& pass, const FuncSet& relevantFuncs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work? I can make this change, but I cannot remove either of the std::moves in this file (use of deleted function errors etc.), so I'm not sure how it helps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, the unique_ptr move constructor gets called twice. First, this pass parameter is move-constructed from whatever rvalue you pass into this FilteredPass constructor. Second, you move-construct the pass member from the pass parameter below in the initializer list with pass(std::move(pass)).

After this change, pass(std::move(pass)) would move-construct the pass member from the rvalue reference you pass into the FilteredPass constructor without move-constructing the pass parameter as a separate intermediate value in the middle.

I'm not sure if it makes any difference after optimizations, but in principle this change results in one less call to the unique_ptr move constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks.

I have very little idea of how the optimizer handles this stuff, but sounds like it can help. Added in the last push.

}

private:
std::unique_ptr<Pass> pass;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having an indirection here, it might make sense to use templates to store the pass inline in the FilteredPass<P> (or even as the supertype of FilteredPass<P>, which would avoid having to implemented modifiesBinaryenIR, etc.). WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how FilteredPass can be templated over the pass. The pass arrives from FilteredPassRunner::doAdd which does not have that information - it just gets a pass instance. (Templating doAdd might be possible but that would be a large change I think and I'm not sure if it can work or not.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, if you don't statically know the full type at every callsite, templates won't work. Makes sense.

: PassRunner(wasm), relevantFuncs(relevantFuncs) {}

protected:
void doAdd(std::unique_ptr<Pass> pass) override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised doAdd is already virtual. Where else do we take advantage of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowhere, I think 😄 it was added for the code I am refactoring IIRC.

I think this is a good design because speed does not matter here: adding a pass is 1000x faster than running the typical pass. And it makes it simple to add such indirection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me 👍

Comment on lines 25 to 27
namespace wasm {

namespace PassUtils {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace wasm {
namespace PassUtils {
namespace wasm::PassUtils {

@kripken kripken enabled auto-merge (squash) November 28, 2023 23:11
@kripken kripken merged commit dbcac17 into main Nov 28, 2023
14 checks passed
@kripken kripken deleted the wrap.pass branch November 28, 2023 23:32
@kripken
Copy link
Member Author

kripken commented Nov 30, 2023

It turns out that this is not exactly NFC, but actually has a benefit:

  1. This avoids the hack of creating a module with only the functions we want to optimize, when we optimize after inlining. That is, before this PR we'd optimize a fake module containing only functions we inlined into (to avoid optimizing on code we did not change at all); after this PR we optimize the real module but use a filtering mechanism to only optimize in the functions we inlined.
  2. Our effect analyzer can look at functions to see if they have effects: that is how the call.without.effects intrinsic works. We see that that function is an import, and identify it as the intrinsic from the module and base names.
  3. In the hack we removed, the import was not in the module, since we removed all functions we didn't inline into. So we couldn't tell it was an intrinsic and assumed the worst. (We allow the function to simply not exist, as we may be optimizing before we finish building up all the IR.) After this PR, we identify it as the intrinsic and can see it has no side effects, so it can be removed.

That should not matter in the long term, as this just means that optimizations after inlining are now a tiny bit more effective than they were before; in particular the very next vacuum will remove those intrinsics. But this can accelerate optimization, that is, fewer rounds are needed to get the same results.

radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
…bAssembly#6132)

Asyncify gained a way to wrap a pass so that it only runs on a given set of
functions, rather than on all functions, so the wrapper "filters" what the pass
operates on. That was useful in Asyncify as we wanted to only do work on
functions that Asyncify actually instrumented.

There is another place in the code that needs such functionality,
optimizeAfterInlining, which runs optimizations after we inline; again, we
only want to optimize on the functions we know are relevant because they
changed. To do that, move that logic out to a general place so it can be
reused. This makes the code there a lot less hackish.

While doing so make the logic only work on function-parallel passes. It
never did anyhow, but now it asserts on that. (It can't run on a general
pass because a general one does not provide an interface to affect which
functions it operates on; a general pass is entirely opaque in that way.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants