[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

DominikAdamski · 2024-06-26T08:08:53Z

Flang-new needs to add mlink-builtin-bitcode objects to properly support offload code generation for AMD GPUs (for example, math functions).

Both Flang-new and Clang rely on mlink-builtin-bitcode flags. These flags are added by the AMDGPUOpenMPToolchain::addClangTargetOptions function. Now, both compilers reuse the same function.

Flang-new tests for AMDGPU were updated by adding the -nogpulib flag. This flag allows running AMDGPU tests on machines without a ROCm stack.

llvmbot · 2024-06-26T08:09:23Z

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-flang-driver

@llvm/pr-subscribers-clang

Author: Dominik Adamski (DominikAdamski)

Changes

Flang-new needs to add mlink-builtin-bitcode objects to properly support offload code generation for AMD GPU.

fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent function: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace.

Full diff: https://github.com/llvm/llvm-project/pull/96742.diff

3 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+2-2)
(modified) clang/lib/Driver/ToolChains/Flang.cpp (+3)
(modified) flang/test/Driver/omp-driver-offload.f90 (+8)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index dd55838dcf384..612d5793232ce 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -8016,7 +8016,7 @@ def source_date_epoch : Separate<["-"], "source-date-epoch">,
 // CUDA Options
 //===----------------------------------------------------------------------===//
 
-let Visibility = [CC1Option] in {
+let Visibility = [CC1Option, FC1Option] in {
 
 def fcuda_is_device : Flag<["-"], "fcuda-is-device">,
   HelpText<"Generate code for CUDA device">,
@@ -8031,7 +8031,7 @@ def fno_cuda_host_device_constexpr : Flag<["-"], "fno-cuda-host-device-constexpr
   HelpText<"Don't treat unattributed constexpr functions as __host__ __device__.">,
   MarshallingInfoNegativeFlag<LangOpts<"CUDAHostDeviceConstexpr">>;
 
-} // let Visibility = [CC1Option]
+} // let Visibility = [CC1Option, FC1Option]
 
 //===----------------------------------------------------------------------===//
 // OpenMP Options
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 42b45dba2bd31..2679f284c5016 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
     StringRef Val = A->getValue();
     CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
   }
+
+  const ToolChain &TC = getToolChain();
+  TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
 }
 
 void Flang::addTargetOptions(const ArgList &Args,
diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90
index 6fb4f4eeeeca1..b8afbe65961dc 100644
--- a/flang/test/Driver/omp-driver-offload.f90
+++ b/flang/test/Driver/omp-driver-offload.f90
@@ -227,3 +227,11 @@
 ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
 ! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
 ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
+
+! RUN:   %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp  \
+! RUN:      --offload-arch=gfx900 \
+! RUN:      --rocm-path=%S/Inputs/rocm %s 2>&1 \
+! RUN:   | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE  %s
+! MLINK-BUILTIN-BITCODE:      "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
+! MLINK-BUILTIN-BITCODE-SAME: "-fcuda-is-device"
+! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc

jhuber6

The fact that it's called -fcuda-is-device is historical cruft, but I guess it's easiest to just work with it. I also hate -mlink-builtin-bitcode as a concept, but we're not quite ready to move away from its hacks unfortunately.

tblah · 2024-06-26T12:24:48Z

clang/lib/Driver/ToolChains/Flang.cpp

@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
    StringRef Val = A->getValue();
    CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
  }
+
+  const ToolChain &TC = getToolChain();
+  TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);


Should there be some kind of warning if these flags are used not with AMDGPU? I don't have a strong opinion here as it is only an fc1 flag.

+1 to having better diagnostics

Hi,
thanks for the feedback. I would like to share my observations with you:

Clang does not verify how we use these flags and it accepts them for non-GPU target.

These flags can be reused by other vendors. For example clang adds mlink-builtin-bitcode option for OpenMP Nvidia GPU as well .

Does that mean that this change would also lead to adding these flags when building for Nvidia GPU with flang?

No. My change does not imply any changes for Nvidia GPUs support.

Flang and Clang share the same LLVM backend which consumes generated LLVM IR. For AMD GPU we need to embed bitcode definitions of GPU math functions. AMD toolchain adds all required options to the compiler invocation for AMD GPU and IMO can be reused between Flang and Clang.

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Clang does not verify how we use these flags and it accepts them for non-GPU target.

It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)

IMO can be reused between Flang and Clang

Are there any plans to extract that logic and share it somewhere?

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Who could be the right person to ask?

It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)

Shall I extend #94763 ? I don't use -fcuda-is-device anymore. Now, I'm only adding -mlink-builtin-bitcode flags to flang-new -fc1 command. The -mlink-builtin-bitcode option was introduced by #94763

IMO can be reused between Flang and Clang

Are there any plans to extract that logic and share it somewhere?

Not yet (at least from my side). I can return to this topic if there is need to support Clang option by Flang for AMD GPU.

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Shall I extend #94763 ?

Yes, please.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

Did you ask? What feedback did you get?

I asked question on flang-compiler slack (openmp/openacc channel). If I get no response, I will raise question on Flang technical community call on Monday.

Flang-new needs to add mlink-builtin-bitcode objects to properly support offload code generation for AMD GPU. fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent function: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace.

banach-space

fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines .

I don't follow - why would anything related to CUDA be relevant here?

banach-space · 2024-06-26T17:09:31Z

clang/lib/Driver/ToolChains/Flang.cpp

@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
    StringRef Val = A->getValue();
    CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
  }
+
+  const ToolChain &TC = getToolChain();
+  TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);


+1 to having better diagnostics

DominikAdamski · 2024-06-27T09:49:19Z

fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines .

I don't follow - why would anything related to CUDA be relevant here?

Clang for AMDGPU supports OpenMP and HIP and it reuses the same code. For example -fcuda-is-device flag needs to be checked for legacy HIP host code. I would like to reuse the same part of the AMD GPU toolchain for Flang.

banach-space · 2024-06-27T10:37:35Z

Clang for AMDGPU supports OpenMP and HIP and it reuses the same code. For example -fcuda-is-device flag needs to be checked for legacy HIP host code.

Thanks! I'm still puzzled though:

In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines

Why would -fcuda-is-device be required? From your link I gather that the AMD logic in Clang simply makes sure that -fcuda-is-device wasn't used?

I would like to reuse the same part of the AMD GPU toolchain for Flang.

That would be great - what's the plan here then? Simply to rely on the code in Clang? Also, note that that's TargetInfo (which lives in clangBasic) rather than Toolchain (that lives in clangDriver). This is actually key because it makes the coupling between Flang and Clang even stronger.

DominikAdamski · 2024-07-01T12:35:00Z

Updated PR after: #96909 .
Scope of changes:

-fcuda-is-device is not attached by OpenMP AMD GPU toolchain any more, so we do not need to accept this flag by Flang-new. This flag remains HIP/CUDA specific.
OpenMP AMD GPU toolchain only searches and attaches required bitcode files to flang -fc1 invocation.

tblah

I think this is okay. I don't want to block the PR on diagnostics for an FC1 flag which aren't present in clang. LGTM and thanks for answering my questions.

Please also wait for @banach-space's approval.

This reverts commit 8252dbe.

DominikAdamski · 2024-07-03T12:09:39Z

@tblah Thanks for your review.
Unfortunately, I had to restore adding fcuda-is-device option ( #97531 ) because of regression related to handling by clang virtual functions in OpenMP target region.

jhuber6 · 2024-07-03T12:11:57Z

Would it be possible for you to investigate that? It really shouldn't be required if we can't help it.

DominikAdamski · 2024-07-03T12:16:32Z

@jhuber6 I'm working on that.

banach-space · 2024-07-05T07:37:27Z

Would it be possible for you to investigate that? It really shouldn't be required if we can't help it.

+1

clang/include/clang/Driver/Options.td

banach-space · 2024-07-05T07:37:00Z

clang/lib/Driver/ToolChains/Flang.cpp

@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
    StringRef Val = A->getValue();
    CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
  }
+
+  const ToolChain &TC = getToolChain();
+  TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);


Shall I extend #94763 ?

Yes, please.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

This reverts commit f2e362e. PR: llvm#99002 causes that -fcuda-is-device is not added for AMD GPU OpenMP.

DominikAdamski · 2024-07-19T08:39:59Z

Would it be possible for you to investigate that? It really shouldn't be required if we can't help it.

+1

Fixed in PR #99002

DominikAdamski · 2024-07-26T08:03:22Z

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

@banach-space I asked question on flang-slack, I mentioned the issue on the latest Flang technical meeting and I described potential solution here: https://discourse.llvm.org/t/offloading-on-nvptx64-target-with-flang-new-leads-to-undefined-reference-s/80237 . I got no feedback.

Can I merge this PR? The issue with -fcuda-is-device is resolved. If you wish I can extend driver checks for -mlink-builtin-bitcode as a separate PR.

banach-space · 2024-07-28T19:04:35Z

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

@banach-space I asked question on flang-slack, I mentioned the issue on the latest Flang technical meeting and I described potential solution here: https://discourse.llvm.org/t/offloading-on-nvptx64-target-with-flang-new-leads-to-undefined-reference-s/80237 . I got no feedback.

Can I merge this PR? The issue with -fcuda-is-device is resolved. If you wish I can extend driver checks for -mlink-builtin-bitcode as a separate PR.

Thanks for following-up! Overall this looks good to me, but please update the summary with some details/context, e.g.

why are you adding -nogpulib in tests?
"Flang-new needs to add mlink-builtin-bitcode" - please add a note explaining that this flag is added by addClangTargetOptions() (that wasn't obvious to me).

Approving as is, no need to wait for me to take another look.

jhuber6 · 2024-07-28T19:12:18Z

What are we relying on -mlink-builtin-bitcode for right now? IIUC it's mostly just math, right?

DominikAdamski · 2024-07-29T07:58:21Z

@jhuber6 You are right. Flang-new for AMD GPU requires -mlink-builtin-bitcode for math functions.

Flang-new needs to add `mlink-builtin-bitcode` objects to properly support offload code generation for AMD GPUs (for example, math functions). Both Flang-new and Clang rely on `mlink-builtin-bitcode` flags. These flags are added by the `AMDGPUOpenMPToolchain::addClangTargetOptions` function. Now, both compilers reuse the same function. Flang-new tests for AMDGPU were updated by adding the `-nogpulib` flag. This flag allows running AMDGPU tests on machines without the ROCm stack.

DominikAdamski requested review from skatrak and tblah June 26, 2024 08:08

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' flang:driver flang Flang issues not falling into any other category labels Jun 26, 2024

DominikAdamski requested a review from jsjodin June 26, 2024 08:09

DominikAdamski requested a review from jhuber6 June 26, 2024 08:09

tblah requested a review from banach-space June 26, 2024 11:03

jhuber6 approved these changes Jun 26, 2024

View reviewed changes

tblah reviewed Jun 26, 2024

View reviewed changes

DominikAdamski force-pushed the flang_mlink_builtin_bitcode branch from 80d4675 to 5b487aa Compare June 26, 2024 15:57

banach-space reviewed Jun 26, 2024

View reviewed changes

DominikAdamski added 2 commits July 1, 2024 04:47

Merge branch 'upstream_main' into flang_mlink_builtin_bitcode

8f50d61

Fixes after llvm#96909

8252dbe

DominikAdamski changed the title ~~[Flang-new][OpenMP] Add offload related flags for AMDGPU~~ [Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP Jul 1, 2024

apply nogpulib only for AMD GPU tests

efd4056

tblah approved these changes Jul 2, 2024

View reviewed changes

DominikAdamski added 2 commits July 3, 2024 06:36

Merge branch 'main' into flang_mlink_builtin_bitcode

223bf97

Revert "Fixes after llvm#96909"

f2e362e

This reverts commit 8252dbe.

banach-space reviewed Jul 5, 2024

View reviewed changes

DominikAdamski added 2 commits July 18, 2024 04:21

Revert "Revert "Fixes after llvm#96909""

7bddef7

This reverts commit f2e362e. PR: llvm#99002 causes that -fcuda-is-device is not added for AMD GPU OpenMP.

Merge branch 'main' into flang_mlink_builtin_bitcode

1135e32

Merge branch 'main' into flang_mlink_builtin_bitcode

d78721f

banach-space approved these changes Jul 28, 2024

View reviewed changes

Merge branch 'main' into flang_mlink_builtin_bitcode

8f194d2

DominikAdamski merged commit d86311f into llvm:main Jul 29, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

DominikAdamski commented Jun 26, 2024 •

edited

Loading

llvmbot commented Jun 26, 2024 •

edited

Loading

jhuber6 left a comment

tblah Jun 26, 2024

banach-space Jun 26, 2024

DominikAdamski Jun 27, 2024

tblah Jun 27, 2024

DominikAdamski Jun 27, 2024

banach-space Jun 27, 2024

DominikAdamski Jul 1, 2024

banach-space Jul 5, 2024

DominikAdamski Jul 19, 2024 •

edited

Loading

banach-space left a comment

banach-space Jun 26, 2024

DominikAdamski commented Jun 27, 2024

banach-space commented Jun 27, 2024

DominikAdamski commented Jul 1, 2024 •

edited

Loading

tblah left a comment

DominikAdamski commented Jul 3, 2024

jhuber6 commented Jul 3, 2024

DominikAdamski commented Jul 3, 2024

banach-space commented Jul 5, 2024

banach-space Jul 5, 2024

DominikAdamski commented Jul 19, 2024

DominikAdamski commented Jul 26, 2024

banach-space commented Jul 28, 2024

jhuber6 commented Jul 28, 2024

DominikAdamski commented Jul 29, 2024

[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

Conversation

DominikAdamski commented Jun 26, 2024 • edited Loading

llvmbot commented Jun 26, 2024 • edited Loading

jhuber6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DominikAdamski Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

banach-space left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DominikAdamski commented Jun 27, 2024

banach-space commented Jun 27, 2024

DominikAdamski commented Jul 1, 2024 • edited Loading

tblah left a comment

Choose a reason for hiding this comment

DominikAdamski commented Jul 3, 2024

jhuber6 commented Jul 3, 2024

DominikAdamski commented Jul 3, 2024

banach-space commented Jul 5, 2024

Choose a reason for hiding this comment

DominikAdamski commented Jul 19, 2024

DominikAdamski commented Jul 26, 2024

banach-space commented Jul 28, 2024

jhuber6 commented Jul 28, 2024

DominikAdamski commented Jul 29, 2024

DominikAdamski commented Jun 26, 2024 •

edited

Loading

llvmbot commented Jun 26, 2024 •

edited

Loading

DominikAdamski Jul 19, 2024 •

edited

Loading

DominikAdamski commented Jul 1, 2024 •

edited

Loading