-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace temporary lowering of Wasm's load_splat
with a new Cranelift instruction
#1175
Comments
@sunfishcode, I wonder if the |
Having a specialized |
Let's do that then; in the past, it's always been ok to create Cranelift instructions that map exactly to Wasm SIMD instructions so that seems like a good path forward. If you add it (somewhere in |
I'd be interested in having a wider discussion; in particular, @cfallin might have a different opinion with respect to the AArch64 backend. |
I think it's a reasonably pragmatic solution to have the dedicated instruction; it reduces work overall in the pipeline if we can carry the whole package of operations together as one unit from Wasm to machine code. There's a possible concern if we have a lot of these; it's certainly nice to only have to worry about literal "load" and "store" instructions when analyzing loads and stores. But perhaps we get around that by having accessors on the instruction ("this inst is a load of some form, with address X and size Y"). Or, in the worst case, the instruction is a black-box with unknown side-effects from the point of view of other optimization passes, so it inhibits some optimization but is still correct. @akirilov-arm, the reason that your (quite reasonable!) approach attempting to merge the ops in the backend didn't work is indeed because load is side-effecting, so will never be returned by |
What is the feature or code improvement you would like to do in Cranelift?
In bytecodealliance/cranelift#1347 I added a temporary lowering for Wasm's
load_splat
to two Cranelift instructions,load + splat
. This generates extra instructions that could be removed by a specialized Craneliftload_splat
instruction or by smarter codegen (e.g. complex addressing on splat).What is the value of adding this in Cranelift?
Fewer instructions produced.
Do you have an implementation plan, and/or ideas for data structures or algorithms to use?
Seeking feedback on which way to proceed: specialized
load_splat
or smarter codegen.Have you considered alternative implementations? If so, how are they better or worse than your proposal?
See above.
The text was updated successfully, but these errors were encountered: