Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace temporary lowering of Wasm's load_splat with a new Cranelift instruction #1175

Closed
abrown opened this issue Jan 14, 2020 · 5 comments · Fixed by #2278
Closed

Replace temporary lowering of Wasm's load_splat with a new Cranelift instruction #1175

abrown opened this issue Jan 14, 2020 · 5 comments · Fixed by #2278
Labels
cranelift Issues related to the Cranelift code generator

Comments

@abrown
Copy link
Contributor

abrown commented Jan 14, 2020

What is the feature or code improvement you would like to do in Cranelift?

In bytecodealliance/cranelift#1347 I added a temporary lowering for Wasm's load_splat to two Cranelift instructions, load + splat. This generates extra instructions that could be removed by a specialized Cranelift load_splat instruction or by smarter codegen (e.g. complex addressing on splat).

What is the value of adding this in Cranelift?

Fewer instructions produced.

Do you have an implementation plan, and/or ideas for data structures or algorithms to use?

Seeking feedback on which way to proceed: specialized load_splat or smarter codegen.

Have you considered alternative implementations? If so, how are they better or worse than your proposal?

See above.

@abrown
Copy link
Contributor Author

abrown commented Jan 14, 2020

@sunfishcode, I wonder if the load-from-constant-table questions I was asking also relate to this (in regard to the complex addressing).

@alexcrichton alexcrichton transferred this issue from bytecodealliance/cranelift Feb 28, 2020
@alexcrichton alexcrichton added the cranelift Issues related to the Cranelift code generator label Feb 28, 2020
@akirilov-arm
Copy link
Contributor

Having a specialized load_splat Cranelift instruction would be beneficial for the AArch64 backend (so that it could generate LD1R instructions). I tried the alternative approach quickly, i.e. smarter codegen, which in the AArch64 case means using maybe_input_insn(), but it didn't work (I was getting None as the return value) - the reason was probably because the load might have side effects.

@abrown
Copy link
Contributor Author

abrown commented Jul 24, 2020

Let's do that then; in the past, it's always been ok to create Cranelift instructions that map exactly to Wasm SIMD instructions so that seems like a good path forward. If you add it (somewhere in meta/src/shared/instructions.rs) feel free to tag me and I'll review it.

@akirilov-arm
Copy link
Contributor

I'd be interested in having a wider discussion; in particular, @cfallin might have a different opinion with respect to the AArch64 backend.

@cfallin
Copy link
Member

cfallin commented Jul 27, 2020

I think it's a reasonably pragmatic solution to have the dedicated instruction; it reduces work overall in the pipeline if we can carry the whole package of operations together as one unit from Wasm to machine code.

There's a possible concern if we have a lot of these; it's certainly nice to only have to worry about literal "load" and "store" instructions when analyzing loads and stores. But perhaps we get around that by having accessors on the instruction ("this inst is a load of some form, with address X and size Y"). Or, in the worst case, the instruction is a black-box with unknown side-effects from the point of view of other optimization passes, so it inhibits some optimization but is still correct.

@akirilov-arm, the reason that your (quite reasonable!) approach attempting to merge the ops in the backend didn't work is indeed because load is side-effecting, so will never be returned by maybe_input_insn(). This prevents the backend from arbitrarily sinking a side-effecting op, which could lead to a miscompile (consider if the load-splat were sunk past a store to an aliasing pointer, for example). The "correct" thing to do is to hoist the splat instead and then merge, but the framework isn't really set up to allow that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants