-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update NVVM ldu/ldg intrinsics with IntrWillReturn and test for DCE #98968
Conversation
…du/ldg intrinsics with IntrWillReturn and test for DCE
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-ir Author: Kevin McAfee (kalxr) ChangesCheck for mayHaveSideEffects when lowering target intrinsics to avoid creating unused chains that are then erroneously eliminated. Without this check intrinsics with side effects (such as a those with !WillReturn) can be eliminated from the SDAG. This is inconsistent with behavior at the LLVM IR level where instructions that may have side effects are considered unsafe to remove. Consider the NVVM ldu/ldg intrinsics and test updated in this patch for an illustration of the inconsistency we are resolving. The new testcases check that these intrinsics are removed when their results are unused. As they are loads with no side effects, this is desirable. Before the intrinsic and SDAG changes in this patch, these tests would have passed as the intrinsic instructions would have been removed at the SDAG level. They would not have been removed at the IR level as they were not IntrWillReturn. When we make the SDAG change alone, these tests fail, as the intrinsics are still not IntrWillReturn and thus are not safe to remove at either the IR or SDAG level. Thus we update the intrinsics with IntrWIllReturn, which combined with their other attributes makes them free of side-effects and safe to remove at both levels. Full diff: https://github.com/llvm/llvm-project/pull/98968.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 854eb2f8dd6df..1e7fdb53059e2 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1529,30 +1529,30 @@ def int_nvvm_mbarrier_pending_count :
// pointer's alignment.
def int_nvvm_ldu_global_i : Intrinsic<[llvm_anyint_ty],
[llvm_anyptr_ty, llvm_i32_ty],
- [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+ [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
"llvm.nvvm.ldu.global.i">;
def int_nvvm_ldu_global_f : Intrinsic<[llvm_anyfloat_ty],
[llvm_anyptr_ty, llvm_i32_ty],
- [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+ [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
"llvm.nvvm.ldu.global.f">;
def int_nvvm_ldu_global_p : Intrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty],
- [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+ [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
"llvm.nvvm.ldu.global.p">;
// Generated within nvvm. Use for ldg on sm_35 or later. Second arg is the
// pointer's alignment.
def int_nvvm_ldg_global_i : Intrinsic<[llvm_anyint_ty],
[llvm_anyptr_ty, llvm_i32_ty],
- [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+ [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
"llvm.nvvm.ldg.global.i">;
def int_nvvm_ldg_global_f : Intrinsic<[llvm_anyfloat_ty],
[llvm_anyptr_ty, llvm_i32_ty],
- [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+ [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
"llvm.nvvm.ldg.global.f">;
def int_nvvm_ldg_global_p : Intrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty],
- [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+ [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
"llvm.nvvm.ldg.global.p">;
// Use for generic pointers
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index b0746014daf5a..2fd12c7c0b1bd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5227,7 +5227,7 @@ void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I,
// definition.
const Function *F = I.getCalledFunction();
bool HasChain = !F->doesNotAccessMemory();
- bool OnlyLoad = HasChain && F->onlyReadsMemory();
+ bool OnlyLoad = HasChain && F->onlyReadsMemory() && !I.mayHaveSideEffects();
// Build the operand list.
SmallVector<SDValue, 8> Ops;
diff --git a/llvm/test/CodeGen/NVPTX/ldu-ldg.ll b/llvm/test/CodeGen/NVPTX/ldu-ldg.ll
index a449a1b1f713c..9c266c78f5ac3 100644
--- a/llvm/test/CodeGen/NVPTX/ldu-ldg.ll
+++ b/llvm/test/CodeGen/NVPTX/ldu-ldg.ll
@@ -148,3 +148,129 @@ define <2 x half> @test_ldg_v2f16(ptr addrspace(1) %ptr) {
%val = tail call <2 x half> @llvm.nvvm.ldg.global.f.v2f16.p1(ptr addrspace(1) %ptr, i32 4)
ret <2 x half> %val
}
+
+; CHECK-LABEL: test_ldu_i8_dead
+define void @test_ldu_i8_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u8
+ %val = tail call i8 @llvm.nvvm.ldu.global.i.i8.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_i16_dead
+define void @test_ldu_i16_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u16
+ %val = tail call i16 @llvm.nvvm.ldu.global.i.i16.p1(ptr addrspace(1) %ptr, i32 2)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_i32_dead
+define void @test_ldu_i32_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u32
+ %val = tail call i32 @llvm.nvvm.ldu.global.i.i32.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_i64_dead
+define void @test_ldu_i64_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u64
+ %val = tail call i64 @llvm.nvvm.ldu.global.i.i64.p1(ptr addrspace(1) %ptr, i32 8)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_p_dead
+define void @test_ldu_p_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u64
+ %val = tail call ptr @llvm.nvvm.ldu.global.p.p1(ptr addrspace(1) %ptr, i32 8)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_f32_dead
+define void @test_ldu_f32_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.f32
+ %val = tail call float @llvm.nvvm.ldu.global.f.f32.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_f64_dead
+define void @test_ldu_f64_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.f64
+ %val = tail call double @llvm.nvvm.ldu.global.f.f64.p1(ptr addrspace(1) %ptr, i32 8)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_f16_dead
+define void @test_ldu_f16_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u16
+ %val = tail call half @llvm.nvvm.ldu.global.f.f16.p1(ptr addrspace(1) %ptr, i32 2)
+ ret void
+}
+
+; CHECK-LABEL: test_ldu_v2f16_dead
+define void @test_ldu_v2f16_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ldu.global.u32
+ %val = tail call <2 x half> @llvm.nvvm.ldu.global.f.v2f16.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_i8_dead
+define void @test_ldg_i8_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u8
+ %val = tail call i8 @llvm.nvvm.ldg.global.i.i8.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_i16_dead
+define void @test_ldg_i16_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u16
+ %val = tail call i16 @llvm.nvvm.ldg.global.i.i16.p1(ptr addrspace(1) %ptr, i32 2)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_i32_dead
+define void @test_ldg_i32_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u32
+ %val = tail call i32 @llvm.nvvm.ldg.global.i.i32.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_i64_dead
+define void @test_ldg_i64_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u64
+ %val = tail call i64 @llvm.nvvm.ldg.global.i.i64.p1(ptr addrspace(1) %ptr, i32 8)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_p_dead
+define void @test_ldg_p_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u64
+ %val = tail call ptr @llvm.nvvm.ldg.global.p.p1(ptr addrspace(1) %ptr, i32 8)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_f32_dead
+define void @test_ldg_f32_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.f32
+ %val = tail call float @llvm.nvvm.ldg.global.f.f32.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_f64_dead
+define void @test_ldg_f64_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.f64
+ %val = tail call double @llvm.nvvm.ldg.global.f.f64.p1(ptr addrspace(1) %ptr, i32 8)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_f16_dead
+define void @test_ldg_f16_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u16
+ %val = tail call half @llvm.nvvm.ldg.global.f.f16.p1(ptr addrspace(1) %ptr, i32 2)
+ ret void
+}
+
+; CHECK-LABEL: test_ldg_v2f16_dead
+define void @test_ldg_v2f16_dead(ptr addrspace(1) %ptr) {
+ ; CHECK-NOT: ld.global.nc.u32
+ %val = tail call <2 x half> @llvm.nvvm.ldg.global.f.v2f16.p1(ptr addrspace(1) %ptr, i32 4)
+ ret void
+}
|
@@ -5227,7 +5227,7 @@ void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I, | |||
// definition. | |||
const Function *F = I.getCalledFunction(); | |||
bool HasChain = !F->doesNotAccessMemory(); | |||
bool OnlyLoad = HasChain && F->onlyReadsMemory(); | |||
bool OnlyLoad = HasChain && F->onlyReadsMemory() && !I.mayHaveSideEffects(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to touch this. This is also a behavior change for callsite attributes, which will cause legalization failures for readnone callsites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment Matt. This particular change is the main motivation for this patch, so I'd like to better understand potential issues with it. Could you elaborate on the behavior change and the failures it would cause? I haven't noticed any failures of that nature reflected in tests, but maybe there isn't coverage for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IR does not have a notion of non-memory side effects. The code previously only considers the attributes of the underlying function definition, which is what all other intrinsic code expects (and changes the opcode between INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN/INTRINSIC_VOID). This now will swap the opcode and operand structure depending on whether the callsite has a memory(none) attribute.
So TLDR this change is just wrong, you can only consider the underlying function declaration and not the instruction / call site instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining!
The code previously only considers the attributes of the underlying function definition, which is what all other intrinsic code expects (and changes the opcode between INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN/INTRINSIC_VOID). This now will swap the opcode and operand structure depending on whether the callsite has a memory(none) attribute.
I understand that it will affect the operands, but I don't see where OnlyLoad
affects the opcode. That seems like something determined by HasChain
.
... you can only consider the underlying function declaration and not the instruction / call site instance
If the issue is with using the instruction, would it then be reasonable to use F->willReturn()
instead of !I.mayHaveSideEffects()
? It still appears to me incorrect to treat an intrinsic as only a load when it may not return, as the previous code is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why willreturn matters in this context. But also, we don't need to repeat every optimization in codegen.
In any case, a patch touching the intrinsic attributes shouldn't be in the same patch as one changing the behavior here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the willReturn
check is to avoid deleting a !willReturn
intrinsic. In their non-updated form, the nvvm intrinsics demonstrate the problem of not checking for willReturn
- the calls are not deleted at the LLVM IR level but are deleted when building the SDAG. This seems incorrect. These calls were unsafe to remove from the IR because the functions are !willReturn
and should be unsafe to remove from the SDAG for the same reason.
Since these nvvm intrinsics are just loads they should actually be removable, so we update them with IntrWillReturn
. I don't have an example of a public intrinsic that is intended to behave the same way these intrinsics did before updating them, but it should be possible to make one and keep it from being removed by being !willReturn
.
That being said, I will remove the SDAG change from this patch and make another PR for that.
llvm/test/CodeGen/NVPTX/ldu-ldg.ll
Outdated
|
||
; CHECK-LABEL: test_ldu_i8_dead | ||
define void @test_ldu_i8_dead(ptr addrspace(1) %ptr) { | ||
; CHECK-NOT: ldu.global.u8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-NOT checks are fragile. Can you precommit generated checks? Also, why would deleting these dead intrinsics need to be done in codegen and not already done in the middle end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously these weren't getting deleted before codegen since they weren't willReturn
. With the added IntrWillReturn
attribute they can be deleted in the middle end, so I'll get rid of these tests and make ones to reflect that.
Thanks @arsenm, would you mind committing on my behalf? I don't have access yet. |
…98968) Summary: Dead calls to these intrinsics were not being deleted at the IR level as they were not marked `IntrWillReturn`, though they were being deleted when building the SDAG. This fixes that and adds a test to confirm they are deleted during `opt` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250715
Dead calls to these intrinsics were not being deleted at the IR level as they were not marked
IntrWillReturn
, though they were being deleted when building the SDAG. This fixes that and adds a test to confirm they are deleted duringopt