Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP #96742

Merged
merged 10 commits into from
Jul 29, 2024

Conversation

DominikAdamski
Copy link
Contributor

@DominikAdamski DominikAdamski commented Jun 26, 2024

Flang-new needs to add mlink-builtin-bitcode objects to properly support offload code generation for AMD GPUs (for example, math functions).

Both Flang-new and Clang rely on mlink-builtin-bitcode flags. These flags are added by the AMDGPUOpenMPToolchain::addClangTargetOptions function. Now, both compilers reuse the same function.

Flang-new tests for AMDGPU were updated by adding the -nogpulib flag. This flag allows running AMDGPU tests on machines without a ROCm stack.

@DominikAdamski DominikAdamski requested review from skatrak and tblah June 26, 2024 08:08
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' flang:driver flang Flang issues not falling into any other category labels Jun 26, 2024
@DominikAdamski DominikAdamski requested a review from jsjodin June 26, 2024 08:09
@llvmbot
Copy link
Member

llvmbot commented Jun 26, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-flang-driver

@llvm/pr-subscribers-clang

Author: Dominik Adamski (DominikAdamski)

Changes

Flang-new needs to add mlink-builtin-bitcode objects to properly support offload code generation for AMD GPU.

fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent function: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace.


Full diff: https://github.com/llvm/llvm-project/pull/96742.diff

3 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) clang/lib/Driver/ToolChains/Flang.cpp (+3)
  • (modified) flang/test/Driver/omp-driver-offload.f90 (+8)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index dd55838dcf384..612d5793232ce 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -8016,7 +8016,7 @@ def source_date_epoch : Separate<["-"], "source-date-epoch">,
 // CUDA Options
 //===----------------------------------------------------------------------===//
 
-let Visibility = [CC1Option] in {
+let Visibility = [CC1Option, FC1Option] in {
 
 def fcuda_is_device : Flag<["-"], "fcuda-is-device">,
   HelpText<"Generate code for CUDA device">,
@@ -8031,7 +8031,7 @@ def fno_cuda_host_device_constexpr : Flag<["-"], "fno-cuda-host-device-constexpr
   HelpText<"Don't treat unattributed constexpr functions as __host__ __device__.">,
   MarshallingInfoNegativeFlag<LangOpts<"CUDAHostDeviceConstexpr">>;
 
-} // let Visibility = [CC1Option]
+} // let Visibility = [CC1Option, FC1Option]
 
 //===----------------------------------------------------------------------===//
 // OpenMP Options
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 42b45dba2bd31..2679f284c5016 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
     StringRef Val = A->getValue();
     CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
   }
+
+  const ToolChain &TC = getToolChain();
+  TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
 }
 
 void Flang::addTargetOptions(const ArgList &Args,
diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90
index 6fb4f4eeeeca1..b8afbe65961dc 100644
--- a/flang/test/Driver/omp-driver-offload.f90
+++ b/flang/test/Driver/omp-driver-offload.f90
@@ -227,3 +227,11 @@
 ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
 ! FORCE-USM-OFFLOAD-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
 ! FORCE-USM-OFFLOAD-SAME: "-fopenmp" "-fopenmp-force-usm"
+
+! RUN:   %flang -### -v --target=x86_64-unknown-linux-gnu -fopenmp  \
+! RUN:      --offload-arch=gfx900 \
+! RUN:      --rocm-path=%S/Inputs/rocm %s 2>&1 \
+! RUN:   | FileCheck --check-prefix=MLINK-BUILTIN-BITCODE  %s
+! MLINK-BUILTIN-BITCODE:      "{{[^"]*}}flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa"
+! MLINK-BUILTIN-BITCODE-SAME: "-fcuda-is-device"
+! MLINK-BUILTIN-BITCODE-SAME: "-mlink-builtin-bitcode" {{.*Inputs.*rocm.*amdgcn.*bitcode.*}}oclc_isa_version_900.bc

@DominikAdamski DominikAdamski requested a review from jhuber6 June 26, 2024 08:09
@tblah tblah requested a review from banach-space June 26, 2024 11:03
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that it's called -fcuda-is-device is historical cruft, but I guess it's easiest to just work with it. I also hate -mlink-builtin-bitcode as a concept, but we're not quite ready to move away from its hacks unfortunately.

@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
StringRef Val = A->getValue();
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
}

const ToolChain &TC = getToolChain();
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some kind of warning if these flags are used not with AMDGPU? I don't have a strong opinion here as it is only an fc1 flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to having better diagnostics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
thanks for the feedback. I would like to share my observations with you:

  1. Clang does not verify how we use these flags and it accepts them for non-GPU target.
  2. These flags can be reused by other vendors. For example clang adds mlink-builtin-bitcode option for OpenMP Nvidia GPU as well .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that this change would also lead to adding these flags when building for Nvidia GPU with flang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. My change does not imply any changes for Nvidia GPUs support.

Flang and Clang share the same LLVM backend which consumes generated LLVM IR. For AMD GPU we need to embed bitcode definitions of GPU math functions. AMD toolchain adds all required options to the compiler invocation for AMD GPU and IMO can be reused between Flang and Clang.

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not verify how we use these flags and it accepts them for non-GPU target.

It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)

IMO can be reused between Flang and Clang

Are there any plans to extract that logic and share it somewhere?

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Who could be the right person to ask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to make Flang "stricter" if we believe that's the right thing to do ;-) (I think that generating useful error/warning messages like "don't mix these flags - that's not supporter" would be a good thing)

Shall I extend #94763 ? I don't use -fcuda-is-device anymore. Now, I'm only adding -mlink-builtin-bitcode flags to flang-new -fc1 command. The -mlink-builtin-bitcode option was introduced by #94763

IMO can be reused between Flang and Clang

Are there any plans to extract that logic and share it somewhere?

Not yet (at least from my side). I can return to this topic if there is need to support Clang option by Flang for AMD GPU.

I don't know if Nvidia also want to reuse their toolchain between Clang and Flang to fully support OpenMP offloading.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I extend #94763 ?

Yes, please.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

Copy link
Contributor Author

@DominikAdamski DominikAdamski Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ask? What feedback did you get?

I asked question on flang-compiler slack (openmp/openacc channel). If I get no response, I will raise question on Flang technical community call on Monday.

Flang-new needs to add mlink-builtin-bitcode objects
to properly support offload code generation for AMD GPU.

fcuda-is-device flag is not used by Flang currently.
In the future it will be needed for Flang equivalent function:
AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace.
@DominikAdamski DominikAdamski force-pushed the flang_mlink_builtin_bitcode branch from 80d4675 to 5b487aa Compare June 26, 2024 15:57
Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines .

I don't follow - why would anything related to CUDA be relevant here?

@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
StringRef Val = A->getValue();
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
}

const ToolChain &TC = getToolChain();
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to having better diagnostics

@DominikAdamski
Copy link
Contributor Author

fcuda-is-device flag is not used by Flang currently. In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines .

I don't follow - why would anything related to CUDA be relevant here?

Clang for AMDGPU supports OpenMP and HIP and it reuses the same code. For example -fcuda-is-device flag needs to be checked for legacy HIP host code. I would like to reuse the same part of the AMD GPU toolchain for Flang.

@banach-space
Copy link
Contributor

Clang for AMDGPU supports OpenMP and HIP and it reuses the same code. For example -fcuda-is-device flag needs to be checked for legacy HIP host code.

Thanks! I'm still puzzled though:

In the future it will be needed for Flang equivalent functions: AMDGPUTargetCodeGenInfo::getGlobalVarAddressSpace AMDGPUTargetInfo::getTargetDefines

Why would -fcuda-is-device be required? From your link I gather that the AMD logic in Clang simply makes sure that -fcuda-is-device wasn't used?

I would like to reuse the same part of the AMD GPU toolchain for Flang.

That would be great - what's the plan here then? Simply to rely on the code in Clang? Also, note that that's TargetInfo (which lives in clangBasic) rather than Toolchain (that lives in clangDriver). This is actually key because it makes the coupling between Flang and Clang even stronger.

@DominikAdamski
Copy link
Contributor Author

DominikAdamski commented Jul 1, 2024

Updated PR after: #96909 .
Scope of changes:

  • -fcuda-is-device is not attached by OpenMP AMD GPU toolchain any more, so we do not need to accept this flag by Flang-new. This flag remains HIP/CUDA specific.

  • OpenMP AMD GPU toolchain only searches and attaches required bitcode files to flang -fc1 invocation.

@DominikAdamski DominikAdamski changed the title [Flang-new][OpenMP] Add offload related flags for AMDGPU [Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP Jul 1, 2024
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay. I don't want to block the PR on diagnostics for an FC1 flag which aren't present in clang. LGTM and thanks for answering my questions.

Please also wait for @banach-space's approval.

@DominikAdamski
Copy link
Contributor Author

@tblah Thanks for your review.
Unfortunately, I had to restore adding fcuda-is-device option ( #97531 ) because of regression related to handling by clang virtual functions in OpenMP target region.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 3, 2024

Would it be possible for you to investigate that? It really shouldn't be required if we can't help it.

@DominikAdamski
Copy link
Contributor Author

@jhuber6 I'm working on that.

@banach-space
Copy link
Contributor

Would it be possible for you to investigate that? It really shouldn't be required if we can't help it.

+1

clang/include/clang/Driver/Options.td Outdated Show resolved Hide resolved
@@ -333,6 +333,9 @@ void Flang::AddAMDGPUTargetArgs(const ArgList &Args,
StringRef Val = A->getValue();
CmdArgs.push_back(Args.MakeArgString("-mcode-object-version=" + Val));
}

const ToolChain &TC = getToolChain();
TC.addClangTargetOptions(Args, CmdArgs, Action::OffloadKind::OFK_OpenMP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I extend #94763 ?

Yes, please.

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

This reverts commit f2e362e.

PR: llvm#99002 causes
that -fcuda-is-device is not added for AMD GPU OpenMP.
@DominikAdamski
Copy link
Contributor Author

Would it be possible for you to investigate that? It really shouldn't be required if we can't help it.

+1

Fixed in PR #99002

@DominikAdamski
Copy link
Contributor Author

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

@banach-space I asked question on flang-slack, I mentioned the issue on the latest Flang technical meeting and I described potential solution here: https://discourse.llvm.org/t/offloading-on-nvptx64-target-with-flang-new-leads-to-undefined-reference-s/80237 . I got no feedback.

Can I merge this PR? The issue with -fcuda-is-device is resolved. If you wish I can extend driver checks for -mlink-builtin-bitcode as a separate PR.

@banach-space
Copy link
Contributor

Who could be the right person to ask?

I don't know. Open-source LLVM Flang meetings can be good place to ask this question.

Did you ask? What feedback did you get?

@banach-space I asked question on flang-slack, I mentioned the issue on the latest Flang technical meeting and I described potential solution here: https://discourse.llvm.org/t/offloading-on-nvptx64-target-with-flang-new-leads-to-undefined-reference-s/80237 . I got no feedback.

Can I merge this PR? The issue with -fcuda-is-device is resolved. If you wish I can extend driver checks for -mlink-builtin-bitcode as a separate PR.

Thanks for following-up! Overall this looks good to me, but please update the summary with some details/context, e.g.

  • why are you adding -nogpulib in tests?
  • "Flang-new needs to add mlink-builtin-bitcode" - please add a note explaining that this flag is added by addClangTargetOptions() (that wasn't obvious to me).

Approving as is, no need to wait for me to take another look.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 28, 2024

What are we relying on -mlink-builtin-bitcode for right now? IIUC it's mostly just math, right?

@DominikAdamski
Copy link
Contributor Author

@jhuber6 You are right. Flang-new for AMD GPU requires -mlink-builtin-bitcode for math functions.

@DominikAdamski DominikAdamski merged commit d86311f into llvm:main Jul 29, 2024
7 checks passed
banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
Flang-new needs to add `mlink-builtin-bitcode` objects to properly
support offload code generation for AMD GPUs (for example, math
functions).

Both Flang-new and Clang rely on `mlink-builtin-bitcode` flags. These
flags are added by the `AMDGPUOpenMPToolchain::addClangTargetOptions`
function. Now, both compilers reuse the same function.

Flang-new tests for AMDGPU were updated by adding the `-nogpulib` flag.
This flag allows running AMDGPU tests on machines without the ROCm stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category flang:driver flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants