-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the Cranelift IR instruction LoadSplat
#2278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (though I am not as familiar with the aarch64 side of things); I'll make a note to remove the if cfg!
in the code translator and lower things in x64 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing work on this on AArch64!
The aarch64-specific lowering logic in cranelift-wasm
feels wrong to me, though -- it's too much of a leaky abstraction. E.g., it means that we have some IR that we can only codegen on one architecture; and it means that the lowering result depends on where we've compiled the crate.
So, if we add the CLIF opcode, we should really handle it in all backends. Since the old x86 backend is still alive, unfortunately that means x86-old, x64-new, and aarch64. The first of these can probably use a legalization rule that decomposes back to load and splat ops; and you've done the third; so then we need a lowering for the x64 backend. @abrown, would you be willing to help with this (and then we can land it all at once)?
Yes, but I might not get around to it until next week? If the delay is not a problem, I don't mind doing that so that the issue is resolved all at once. (I think I can push to this PR, right?) |
That sounds fine to me at least! |
I don't mind a one week delay either. |
9b14e57
to
d99bfa9
Compare
I added the necessary legalization rule to the old x86 backend. I also removed the temporary work-around in |
@akirilov-arm, nice! So I'm tracking the only thing I need to do next week is lower |
@abrown Yes, that's the only thing remaining, and no, I don't intend to do any other x86 changes (unless I am addressing review feedback, of course). However, I have noticed that the peephole optimizer tests are failing and according to the logs the reason seems unrelated to my changes... |
@akirilov-arm, @cfallin: I committed an implementation of |
Looking at the failures in the CI, I think something is behaving weirdly re: Git. The lines it claims need extra parameters should have those added by c9c8de6 and |
c5b65b0
to
06b9510
Compare
@abrown I have just rebased, but the tests are still failing. Your changes look fine to me otherwise, but I am not particularly familiar with the x64 backend. |
It corresponds to WebAssembly's `load*_splat` operations, which were previously represented as a combination of `Load` and `Splat` instructions. However, there are architectures such as Armv8-A that have a single machine instruction equivalent to the Wasm operations. In order to generate it, it is necessary to merge the `Load` and the `Splat` in the backend, which is not possible because the load may have side effects. The new IR instruction works around this limitation. The AArch64 backend leverages the new instruction to improve code generation. Copyright (c) 2020, Arm Limited.
06b9510
to
46a080d
Compare
I am not sure why, but I have moved the new IR instruction definition to the end of the function defining all the operations in cc @fitzgen |
@abrown It looks like you missed updating a couple of |
I apologise for getting to this so late in the day. I would like to request that this be reconsidered. I believe the approach discussed so far works against the design goals of the new backend framework, and that an alternative approach would be preferable. A primary goal of CLIF, in the new backend framework, is to provide a way to conveniently do redundancy elimination. One key aspect is to provide only a single way to represent any computation. This change runs counter to those goals. For example, consider GVN: with this change in place, there is no way to GVN a single Of course I want this to become a single instruction on AArch64. The instruction selection framework in the new backend has been designed from the start to allow matching multiple CLIF nodes into a single machine instruction, and so we should use that here, and not add the proposed @akirilov-arm writes:
I don't follow .. can you clarify this? What instruction are you referring to? |
|
Ah, I see. Yes, it's true; the new framework currently doesn't allow what you tried -- it only allows a load to be at the root of a pattern tree. Whereas here, the splat would be at the root. Having said that, @cfallin was talking recently about lifting that restriction in the framework. This isn't of much significance in the AArch64 case (apart from here, obviously) since AArch64 has very few load-op instructions. But for x64, almost all instructions have a load-op form, and so it's important that the framework can support that idiom effectively. Hence it might be the case that this limitation gets lifted sooner rather than later. |
In order to register traps for `load_splat`, several instruction formats need knowledge of `SourceLoc`s; however, since the x64 backend does not correctly and completely register traps for `RegMem::Mem` variants I opened bytecodealliance#2290 to discuss and resolve this issue. In the meantime, the current behavior (i.e. remaining largely unaware of `SourceLoc`s) is retained.
@akirilov-arm: good call, now d990dd4 should have source locations in more places. @julian-seward1: regardless of what we decide to do with this PR, you should also take a look at #2290--we need a good way to trap at all of the places an x64 instruction could load (which could be quite a few) so I suggest a possible alternative in that issue. |
Thanks, @abrown! @cfallin @julian-seward1 How are we going to proceed with this? The implementation is now complete (covering all three major backends) and passes all tests. Based on the discussion in #1175, I believed that this would be an acceptable approach, but even if it is not, most of my code is actually independent of whether there is a new CLIF operation or not, and I think that's the case for @abrown's code as well. Are there plans to change the new framework soon, so that it supports the type of pattern matching that would be applicable to this case? |
@akirilov-arm @julian-seward1 @abrown so, I'm off for this week and next (but I'll try to keep up with this when I can). At a high level, I am fine with the pragmatic approach here (and in #1175) of creating these merged opcodes for now, and then eventually removing them when we have the story for load-related code motion sorted out. In other words, I don't necessarily think that we are so close to having the "right thing" built that we should hold back true codegen improvements in the meantime and create a dependence on a to-be-built part of the framework; I think the cost of removing these opcodes later is minimal, and can be done as part of a PR that introduces the proper pattern-matching. It's not the ideal end-state, but it lets us move forward in the meantime. @julian-seward1, what do you think? |
So, @julian-seward1 and I discussed this last week; apologies for taking so long to get back to it (I'm back to work again, albeit somewhat busy with onboarding stuff.) In general I agree with @julian-seward1 (and it seems others would be on board with) a better approach eventually that can actually handle pattern-matching load sources to instructions. This has been on my to-do list for a while; but in theory is possible if we loosen the load-motion conditions a bit. However, I think that having the better codegen as a starting point is valuable; if we take this as-is for now, then we have a baseline whose codegen output we should match when we do have the proper pattern-matching in place. Thus, I'm happy to take this, and I'll commit to removing the |
@cfallin Sounds good to me. It's always better to have a good test case before adding non-trivial functionality and, if I understand correctly, previously there were no actual use cases for the enhanced pattern matching. |
Well, I can't say I'm enthusiastic; I think it moves the design in the wrong direction. But if it's only temporary, and since it's useful to land the rest of it anyway, I won't stand in the way. I should add .. Today I implemented the proposed wasm instructions So .. if/when that lands, it'll create further CLIF-level inconsistency, at least in the short term :-/ |
@julian-seward1 understood, and yes, not an ideal situation; but I think this should be a useful test case in the sense that if a new matching framework can handle this case (allow I'll go ahead and approve and merge now; thanks for the discussion, all! |
The changes in bytecodealliance#2278 added `SourceLoc`s to several x64 `Inst` variants; between when that PR was last run in CI and when it was merged, new instructions were added that require this new parameter. This change adds the parameter in order to fix CI.
The changes in bytecodealliance#2278 added `SourceLoc`s to several x64 `Inst` variants; between when that PR was last run in CI and when it was merged, new instructions were added that require this new parameter. This change adds the parameter in order to fix CI.
The changes in #2278 added `SourceLoc`s to several x64 `Inst` variants; between when that PR was last run in CI and when it was merged, new instructions were added that require this new parameter. This change adds the parameter in order to fix CI.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
This was added as an incremental step to improve AArch64 code quality in PR bytecodealliance#2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR bytecodealliance#2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
The changes in #2278 added `SourceLoc`s to several x64 `Inst` variants; between when that PR was last run in CI and when it was merged, new instructions were added that require this new parameter. This change adds the parameter in order to fix CI.
This was added as an incremental step to improve AArch64 code quality in PR #2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR #2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.
It corresponds to WebAssembly's
load*_splat
operations, which were previously represented as a combination ofLoad
andSplat
instructions. However, there are architectures such as Armv8-A that have a single machine instruction equivalent to the Wasm operations. In order to generate it, it is necessary to merge theLoad
and theSplat
in the backend, which is not possible because the load may have side effects. The new IR instruction works around this limitation.The AArch64 backend leverages the new instruction to improve code generation. I am not really qualified to implement the optimization for x86, so I have added a temporary work-around to avoid any test breakage. It's a bit of a hack, but it should be incorrect only when cross-compiling to x86 on an Arm platform.
Fixes #1175 (but other backends need work too).
cc @abrown @cfallin