Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NVVM ldu/ldg intrinsics with IntrWillReturn and test for DCE #98968

Merged
merged 6 commits into from
Jul 24, 2024

Conversation

kalxr
Copy link
Contributor

@kalxr kalxr commented Jul 15, 2024

Dead calls to these intrinsics were not being deleted at the IR level as they were not marked IntrWillReturn, though they were being deleted when building the SDAG. This fixes that and adds a test to confirm they are deleted during opt

…du/ldg intrinsics with IntrWillReturn and test for DCE
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-nvptx
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-llvm-ir

Author: Kevin McAfee (kalxr)

Changes

Check for mayHaveSideEffects when lowering target intrinsics to avoid creating unused chains that are then erroneously eliminated. Without this check intrinsics with side effects (such as a those with !WillReturn) can be eliminated from the SDAG. This is inconsistent with behavior at the LLVM IR level where instructions that may have side effects are considered unsafe to remove.

Consider the NVVM ldu/ldg intrinsics and test updated in this patch for an illustration of the inconsistency we are resolving. The new testcases check that these intrinsics are removed when their results are unused. As they are loads with no side effects, this is desirable. Before the intrinsic and SDAG changes in this patch, these tests would have passed as the intrinsic instructions would have been removed at the SDAG level. They would not have been removed at the IR level as they were not IntrWillReturn. When we make the SDAG change alone, these tests fail, as the intrinsics are still not IntrWillReturn and thus are not safe to remove at either the IR or SDAG level. Thus we update the intrinsics with IntrWIllReturn, which combined with their other attributes makes them free of side-effects and safe to remove at both levels.


Full diff: https://github.com/llvm/llvm-project/pull/98968.diff

3 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+6-6)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ldu-ldg.ll (+126)
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 854eb2f8dd6df..1e7fdb53059e2 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1529,30 +1529,30 @@ def int_nvvm_mbarrier_pending_count :
 // pointer's alignment.
 def int_nvvm_ldu_global_i : Intrinsic<[llvm_anyint_ty],
   [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
   "llvm.nvvm.ldu.global.i">;
 def int_nvvm_ldu_global_f : Intrinsic<[llvm_anyfloat_ty],
   [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
   "llvm.nvvm.ldu.global.f">;
 def int_nvvm_ldu_global_p : Intrinsic<[llvm_anyptr_ty],
   [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
   "llvm.nvvm.ldu.global.p">;
 
 // Generated within nvvm. Use for ldg on sm_35 or later.  Second arg is the
 // pointer's alignment.
 def int_nvvm_ldg_global_i : Intrinsic<[llvm_anyint_ty],
   [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
   "llvm.nvvm.ldg.global.i">;
 def int_nvvm_ldg_global_f : Intrinsic<[llvm_anyfloat_ty],
   [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
   "llvm.nvvm.ldg.global.f">;
 def int_nvvm_ldg_global_p : Intrinsic<[llvm_anyptr_ty],
   [llvm_anyptr_ty, llvm_i32_ty],
-  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, NoCapture<ArgIndex<0>>],
+  [IntrReadMem, IntrArgMemOnly, IntrNoCallback, IntrWillReturn, NoCapture<ArgIndex<0>>],
   "llvm.nvvm.ldg.global.p">;
 
 // Use for generic pointers
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index b0746014daf5a..2fd12c7c0b1bd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5227,7 +5227,7 @@ void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I,
   // definition.
   const Function *F = I.getCalledFunction();
   bool HasChain = !F->doesNotAccessMemory();
-  bool OnlyLoad = HasChain && F->onlyReadsMemory();
+  bool OnlyLoad = HasChain && F->onlyReadsMemory() && !I.mayHaveSideEffects();
 
   // Build the operand list.
   SmallVector<SDValue, 8> Ops;
diff --git a/llvm/test/CodeGen/NVPTX/ldu-ldg.ll b/llvm/test/CodeGen/NVPTX/ldu-ldg.ll
index a449a1b1f713c..9c266c78f5ac3 100644
--- a/llvm/test/CodeGen/NVPTX/ldu-ldg.ll
+++ b/llvm/test/CodeGen/NVPTX/ldu-ldg.ll
@@ -148,3 +148,129 @@ define <2 x half> @test_ldg_v2f16(ptr addrspace(1) %ptr) {
   %val = tail call <2 x half> @llvm.nvvm.ldg.global.f.v2f16.p1(ptr addrspace(1) %ptr, i32 4)
   ret <2 x half> %val
 }
+
+; CHECK-LABEL: test_ldu_i8_dead
+define void @test_ldu_i8_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u8
+  %val = tail call i8 @llvm.nvvm.ldu.global.i.i8.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_i16_dead
+define void @test_ldu_i16_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u16
+  %val = tail call i16 @llvm.nvvm.ldu.global.i.i16.p1(ptr addrspace(1) %ptr, i32 2)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_i32_dead
+define void @test_ldu_i32_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u32
+  %val = tail call i32 @llvm.nvvm.ldu.global.i.i32.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_i64_dead
+define void @test_ldu_i64_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u64
+  %val = tail call i64 @llvm.nvvm.ldu.global.i.i64.p1(ptr addrspace(1) %ptr, i32 8)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_p_dead
+define void @test_ldu_p_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u64
+  %val = tail call ptr @llvm.nvvm.ldu.global.p.p1(ptr addrspace(1) %ptr, i32 8)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_f32_dead
+define void @test_ldu_f32_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.f32
+  %val = tail call float @llvm.nvvm.ldu.global.f.f32.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_f64_dead
+define void @test_ldu_f64_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.f64
+  %val = tail call double @llvm.nvvm.ldu.global.f.f64.p1(ptr addrspace(1) %ptr, i32 8)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_f16_dead
+define void @test_ldu_f16_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u16
+  %val = tail call half @llvm.nvvm.ldu.global.f.f16.p1(ptr addrspace(1) %ptr, i32 2)
+  ret void
+}
+
+; CHECK-LABEL: test_ldu_v2f16_dead
+define void @test_ldu_v2f16_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ldu.global.u32
+  %val = tail call <2 x half> @llvm.nvvm.ldu.global.f.v2f16.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_i8_dead
+define void @test_ldg_i8_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u8
+  %val = tail call i8 @llvm.nvvm.ldg.global.i.i8.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_i16_dead
+define void @test_ldg_i16_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u16
+  %val = tail call i16 @llvm.nvvm.ldg.global.i.i16.p1(ptr addrspace(1) %ptr, i32 2)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_i32_dead
+define void @test_ldg_i32_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u32
+  %val = tail call i32 @llvm.nvvm.ldg.global.i.i32.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_i64_dead
+define void @test_ldg_i64_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u64
+  %val = tail call i64 @llvm.nvvm.ldg.global.i.i64.p1(ptr addrspace(1) %ptr, i32 8)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_p_dead
+define void @test_ldg_p_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u64
+  %val = tail call ptr @llvm.nvvm.ldg.global.p.p1(ptr addrspace(1) %ptr, i32 8)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_f32_dead
+define void @test_ldg_f32_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.f32
+  %val = tail call float @llvm.nvvm.ldg.global.f.f32.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_f64_dead
+define void @test_ldg_f64_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.f64
+  %val = tail call double @llvm.nvvm.ldg.global.f.f64.p1(ptr addrspace(1) %ptr, i32 8)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_f16_dead
+define void @test_ldg_f16_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u16
+  %val = tail call half @llvm.nvvm.ldg.global.f.f16.p1(ptr addrspace(1) %ptr, i32 2)
+  ret void
+}
+
+; CHECK-LABEL: test_ldg_v2f16_dead
+define void @test_ldg_v2f16_dead(ptr addrspace(1) %ptr) {
+  ; CHECK-NOT: ld.global.nc.u32
+  %val = tail call <2 x half> @llvm.nvvm.ldg.global.f.v2f16.p1(ptr addrspace(1) %ptr, i32 4)
+  ret void
+}

@@ -5227,7 +5227,7 @@ void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I,
// definition.
const Function *F = I.getCalledFunction();
bool HasChain = !F->doesNotAccessMemory();
bool OnlyLoad = HasChain && F->onlyReadsMemory();
bool OnlyLoad = HasChain && F->onlyReadsMemory() && !I.mayHaveSideEffects();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to touch this. This is also a behavior change for callsite attributes, which will cause legalization failures for readnone callsites

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment Matt. This particular change is the main motivation for this patch, so I'd like to better understand potential issues with it. Could you elaborate on the behavior change and the failures it would cause? I haven't noticed any failures of that nature reflected in tests, but maybe there isn't coverage for it.

Copy link
Contributor

@arsenm arsenm Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IR does not have a notion of non-memory side effects. The code previously only considers the attributes of the underlying function definition, which is what all other intrinsic code expects (and changes the opcode between INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN/INTRINSIC_VOID). This now will swap the opcode and operand structure depending on whether the callsite has a memory(none) attribute.

So TLDR this change is just wrong, you can only consider the underlying function declaration and not the instruction / call site instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

The code previously only considers the attributes of the underlying function definition, which is what all other intrinsic code expects (and changes the opcode between INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN/INTRINSIC_VOID). This now will swap the opcode and operand structure depending on whether the callsite has a memory(none) attribute.

I understand that it will affect the operands, but I don't see where OnlyLoad affects the opcode. That seems like something determined by HasChain.

... you can only consider the underlying function declaration and not the instruction / call site instance

If the issue is with using the instruction, would it then be reasonable to use F->willReturn() instead of !I.mayHaveSideEffects()? It still appears to me incorrect to treat an intrinsic as only a load when it may not return, as the previous code is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why willreturn matters in this context. But also, we don't need to repeat every optimization in codegen.

In any case, a patch touching the intrinsic attributes shouldn't be in the same patch as one changing the behavior here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the willReturn check is to avoid deleting a !willReturn intrinsic. In their non-updated form, the nvvm intrinsics demonstrate the problem of not checking for willReturn - the calls are not deleted at the LLVM IR level but are deleted when building the SDAG. This seems incorrect. These calls were unsafe to remove from the IR because the functions are !willReturn and should be unsafe to remove from the SDAG for the same reason.

Since these nvvm intrinsics are just loads they should actually be removable, so we update them with IntrWillReturn. I don't have an example of a public intrinsic that is intended to behave the same way these intrinsics did before updating them, but it should be possible to make one and keep it from being removed by being !willReturn.

That being said, I will remove the SDAG change from this patch and make another PR for that.


; CHECK-LABEL: test_ldu_i8_dead
define void @test_ldu_i8_dead(ptr addrspace(1) %ptr) {
; CHECK-NOT: ldu.global.u8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-NOT checks are fragile. Can you precommit generated checks? Also, why would deleting these dead intrinsics need to be done in codegen and not already done in the middle end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously these weren't getting deleted before codegen since they weren't willReturn. With the added IntrWillReturn attribute they can be deleted in the middle end, so I'll get rid of these tests and make ones to reflect that.

@kalxr kalxr changed the title Check for side effects when lowering target intrinsics, update NVVM ldu/ldg intrinsics with IntrWillReturn and test for DCE Update NVVM ldu/ldg intrinsics with IntrWillReturn and test for DCE Jul 22, 2024
@kalxr
Copy link
Contributor Author

kalxr commented Jul 24, 2024

Thanks @arsenm, would you mind committing on my behalf? I don't have access yet.

@arsenm arsenm merged commit 8c7188a into llvm:main Jul 24, 2024
8 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…98968)

Summary:
Dead calls to these intrinsics were not being deleted at the IR level as
they were not marked `IntrWillReturn`, though they were being deleted
when building the SDAG. This fixes that and adds a test to confirm they
are deleted during `opt`

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants