Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Add updated FEAT_SVE_B16B16 and begin replacement of 'b16b16' flag #101480

Merged
merged 4 commits into from
Aug 7, 2024

Conversation

SpencerAbson
Copy link
Contributor

@SpencerAbson SpencerAbson commented Aug 1, 2024

This patch adds FeatureSVEB16B16 to the AArch64 backend in order to represent the new behavior of FEAT_SVE_B16B16 (as described in the latest Armv9.4 extensions documentation) as well as a 'sve-b16b16' flag to enable it.

The predication of non-widening SVE BFloat16 instructions has changed to require this feature, instead of the previously required and soon-to-be-removed FeatureB16B16 which is enabled by the 'b16b16' flag. Therefore, this change weakens the 'b16b16' flag in favour of 'sve-b16b16'. Existing tests that are effected by this have been modified to use and/or expect 'sve-b16b16', and new tests have been added to verify the behavior and implementation of 'sve-b16b16'.

This patch is in response to the response to the following changes.

The architecture features previously enabled by FEAT_SVE_B16B16 have been relaxed such that it now implements:
      - With FEAT_SVE2 : SVE non-widening BFloat16 instructions in Non-streaming SVE mode
      - With FEAT_SME2: SVE non-widening BFloat16 instructions when the PE is in Streaming SVE mode and SME
       Z-targeting multi-vector non-widening BFloat16 instructions.
      - It no longer implements SME ZA-targeting non-widening BFloat16 instructions.   

The SME ZA-targeting non-widening BFloat16 instructions are implemented by the new FEAT_SME_B16B16, this patch does not change how this architecture feature is enabled ('+b16b16+sme2'). Only those that are implemented by FEAT_SVE_B16B16 have been changed to require 'sve-b16b16' instead of 'b16b16'.

New flags must be created to represent FEAT_SVE_B16B16 and FEAT_SME_B16B16:
      - 'sve-b16b16' enables the updated FEAT_SVE_B16B16 (described here)
      - 'sme-b16b16' will enable the new FEAT_SME_B16B16
      - This patch includes 'sve-b16b16' only
   
A future patch will add 'sme-b16b16', SME ZA-targeting non-widening BFloat16 instructions would then be guarded by '+sme-b16b16+sme2', and 'b16b16' can be removed.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" mc Machine (object) code labels Aug 1, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 1, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-aarch64

Author: None (SpencerAbson)

Changes

This patch adds FeatureSVEB16B16 to represent the new behavior of FEAT_SVE_B16B16 as described in the latest
Armv9.4 extensions documentation, as well as a 'sve-b16b16' flag to enable it. The predication of SVE BFloat16 to BFloat16 instructions has changed to reflect the new behavior of FEAT_SVE_B16B16. This change weakens the existing 'b16b16' flag used to enable the old version of FEAT_SVE_B16B16; it is partially replaced by the new 'sve-b16b16' flag. Existing tests that are effected by this have been modified to use and/or expect 'sve-b16b16', and new tests have been added to verify the behavior and implementation of 'sve-b16b16'.

This patch is in response to the response to the following changes.

The architecture features previously enabled by FEAT_SVE_B16B16 have been relaxed such that it now implements:
      - With FEAT_SVE2 : SVE non-widening BFloat16 instructions in Non-streaming SVE mode
      - With FEAT_SME2: SVE non-widening BFloat16 instructions when the PE is in Streaming SVE mode and SVE
       Z-targeting multi-vector non-widening BFloat16 instructions.
      - It no longer implements SME ZA-targeting non-widening BFloat16 instructions.   

The SME ZA-targeting non-widening BFloat16 instructions are implemented by the new FEAT_SME_B16B16, this patch does not change how this architecture feature is enabled ('+b16b16+sme2'). Only those that are implemented by FEAT_SVE_B16B16 have been changed to require 'sve-b16b16' instead of 'b16b16'.

New flags must be created to represent FEAT_SVE_B16B16 and FEAT_SME_B16B16:
      - 'sve-b16b16' enables the updated FEAT_SVE_B16B16 (described here)
      - 'sme-b16b16' will enable the new FEAT_SME_B16B16
      - This patch includes only 'sve-b16b16'
   
A future patch will add 'sme-b16b16', SME ZA-targeting non-widening BFloat16 instructions would then be guarded by '+sme-b16b16+sme2', and 'b16b16' can be removed.


Patch is 192.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101480.diff

84 Files Affected:

  • (modified) clang/include/clang/Basic/arm_sve.td (+2-2)
  • (modified) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_clamp.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_max.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_maxnm.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_min.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_minnm.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfadd.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfclamp.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmax.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmaxnm.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmin.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfminnm.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmla.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmla_lane.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmls.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmls_lane.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmul.c (+6-6)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmul_lane.c (+5-5)
  • (modified) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfsub.c (+6-6)
  • (modified) clang/test/Driver/print-supported-extensions-aarch64.c (+2-1)
  • (modified) clang/test/Sema/aarch64-sme2-intrinsics/acle_sme2_b16b16.cpp (+2-2)
  • (modified) clang/test/Sema/aarch64-sve2-intrinsics/acle_sve2_bfloat.cpp (+1-1)
  • (added) clang/test/Sema/aarch64-sve2p1-intrinsics/acle_sve2p1_b16b16.cpp (+49)
  • (added) clang/test/Sema/aarch64-sve2p1-intrinsics/acle_sve2p1_b16b16_streaming.cpp (+50)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+5-2)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+7-3)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+4-4)
  • (modified) llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp (+1)
  • (modified) llvm/test/CodeGen/AArch64/sme2-intrinsics-max.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sme2-intrinsics-min.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2-min-max-clamp.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfadd.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfclamp.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmax.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmaxnm.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmin.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfminnm.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmla.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmla_lane.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmls.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmls_lane.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmul.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfmul_lane.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-bfsub.ll (+1-1)
  • (modified) llvm/test/MC/AArch64/SME2/bfclamp-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SME2/bfclamp.s (+14-14)
  • (modified) llvm/test/MC/AArch64/SME2/bfmax-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SME2/bfmax.s (+22-22)
  • (modified) llvm/test/MC/AArch64/SME2/bfmaxnm-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SME2/bfmaxnm.s (+22-22)
  • (modified) llvm/test/MC/AArch64/SME2/bfmin-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SME2/bfmin.s (+22-22)
  • (modified) llvm/test/MC/AArch64/SME2/bfminnm-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SME2/bfminnm.s (+22-22)
  • (modified) llvm/test/MC/AArch64/SME2p1/directive-arch-negative.s (+6)
  • (modified) llvm/test/MC/AArch64/SME2p1/directive-arch.s (+3-1)
  • (modified) llvm/test/MC/AArch64/SME2p1/directive-arch_extension-negative.s (+7)
  • (modified) llvm/test/MC/AArch64/SME2p1/directive-arch_extension.s (+5)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfadd-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfadd.s (+24-24)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfclamp-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfclamp.s (+19-19)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmax-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmax.s (+20-20)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmaxnm-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmaxnm.s (+20-20)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmin-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmin.s (+20-20)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfminnm-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfminnm.s (+20-20)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmla-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmla.s (+25-25)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmls-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmls.s (+25-25)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmul-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfmul.s (+28-28)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfsub-diagnostics.s (+1-1)
  • (modified) llvm/test/MC/AArch64/SVE2p1/bfsub.s (+24-24)
  • (modified) llvm/test/MC/AArch64/SVE2p1/directive-arch-negative.s (+6)
  • (modified) llvm/test/MC/AArch64/SVE2p1/directive-arch.s (+4)
  • (modified) llvm/test/MC/AArch64/SVE2p1/directive-arch_extension-negative.s (+7)
  • (modified) llvm/test/MC/AArch64/SVE2p1/directive-arch_extension.s (+5)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+3-1)
diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td
index 94c093d891156..59c948138a5c0 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -2092,7 +2092,7 @@ let SVETargetGuard = "sve2p1", SMETargetGuard = "sme2" in {
   def SVCNTP_COUNT : SInst<"svcntp_{d}", "n}i", "QcQsQiQl", MergeNone, "aarch64_sve_cntp_{d}", [IsOverloadNone, VerifyRuntimeMode], [ImmCheck<1, ImmCheck2_4_Mul2>]>;
 }
 
-let SVETargetGuard = "sve2,b16b16", SMETargetGuard = "sme2,b16b16" in {
+let SVETargetGuard = "sve2,sve-b16b16", SMETargetGuard = "sme2,sve-b16b16" in {
 defm SVMUL_BF  : SInstZPZZ<"svmul",  "b", "aarch64_sve_fmul",   "aarch64_sve_fmul_u", [VerifyRuntimeMode]>;
 defm SVADD_BF  : SInstZPZZ<"svadd",  "b", "aarch64_sve_fadd",   "aarch64_sve_fadd_u", [VerifyRuntimeMode]>;
 defm SVSUB_BF  : SInstZPZZ<"svsub",  "b", "aarch64_sve_fsub",   "aarch64_sve_fsub_u", [VerifyRuntimeMode]>;
@@ -2172,7 +2172,7 @@ let SVETargetGuard = InvalidMode, SMETargetGuard = "sme2" in {
   def SVFCLAMP_X4 : SInst<"svclamp[_single_{d}_x4]",  "44dd",   "hfd",      MergeNone, "aarch64_sve_fclamp_single_x4",  [IsStreaming], []>;
 }
 
-let SVETargetGuard = InvalidMode, SMETargetGuard = "sme2,b16b16"in {
+let SVETargetGuard = InvalidMode, SMETargetGuard = "sme2,sve-b16b16"in {
   def SVBFCLAMP_X2 : SInst<"svclamp[_single_{d}_x2]",  "22dd",   "b",      MergeNone, "aarch64_sve_bfclamp_single_x2",  [IsStreaming], []>;
   def SVBFCLAMP_X4 : SInst<"svclamp[_single_{d}_x4]",  "44dd",   "b",      MergeNone, "aarch64_sve_bfclamp_single_x4",  [IsStreaming], []>;
 }
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_clamp.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_clamp.c
index 30d963d5425c4..972a658299883 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_clamp.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_clamp.c
@@ -1,14 +1,14 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // REQUIRES: aarch64-registered-target
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 \
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 \
 // RUN:  -Werror -emit-llvm -disable-O0-optnone -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 \
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 \
 // RUN:  -Werror -emit-llvm -disable-O0-optnone -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 \
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 \
 // RUN:  -Werror -emit-llvm -disable-O0-optnone -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 \
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 \
 // RUN:  -Werror -emit-llvm -disable-O0-optnone -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 \
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 \
 // RUN:  -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 
 #include <arm_sme.h>
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_max.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_max.c
index cc084f74d8a49..bd8d57e352331 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_max.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_max.c
@@ -1,9 +1,9 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 // REQUIRES: aarch64-registered-target
 #include <arm_sme.h>
 
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_maxnm.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_maxnm.c
index f48c885497813..07659932bef0a 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_maxnm.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_maxnm.c
@@ -1,11 +1,11 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // REQUIRES: aarch64-registered-target
 
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 #include <arm_sme.h>
 
 #ifdef SVE_OVERLOADED_FORMS
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_min.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_min.c
index df9386092737b..fe7b74c005247 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_min.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_min.c
@@ -1,9 +1,9 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -fclang-abi-compat=latest -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 // REQUIRES: aarch64-registered-target
 #include <arm_sme.h>
 
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_minnm.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_minnm.c
index 65d440df870d2..3b221c030eddf 100644
--- a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_minnm.c
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_minnm.c
@@ -1,11 +1,11 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // REQUIRES: aarch64-registered-target
 
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 #include <arm_sme.h>
 
 #ifdef SVE_OVERLOADED_FORMS
diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfadd.c b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfadd.c
index 452b8fc6e0bb4..0f3b92f81cdee 100644
--- a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfadd.c
+++ b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfadd.c
@@ -1,11 +1,11 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // REQUIRES: aarch64-registered-target
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
-// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
-// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
-// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve-b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -target-feature +sve-b16b16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
 #include <arm_sve.h>
 
 #if defined __ARM_FEATURE_SME
diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfclamp.c b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfclamp.c
index 57f025fbbada7..0955994868480 100644
--- a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfclamp.c
+++ b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfclamp.c
@@ -1,10 +1,10 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // REQUIRES: aarch64-registered-target
-// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +b16b16 -disable-O0-optnone -Werr...
[truncated]

Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Spencer for the patch.
I believe we missed the multi vector instructions targeting BF16.

Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Spencer for the work.
I left some comments about the sve2.1 feature flags being used for the sve2. But I dont think you need to change in this patch. Because it is unrelated
Maybe you create another patch a NFC to move all the sve2p1 tests and run lines to be only SVE2?
I believe when we changed the feature flag we forgot to move the tests and also to fix all run lines.

// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -o /dev/null %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to you patch, but the instructions that you changed now only require sve2.
I believe you can change the run line to be only sve2.
The only ones that are sve2.1 are in section
SVE2.1 instruction intrinsics in the ACLE https://github.com/ARM-software/acle/blob/main/main/acle.md

// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -o /dev/null %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.
s/sve2p1/sve2/
Please!

// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -target-feature +sve-b16b16 -disable-O0-optnone -Werror -Wall -o /dev/null %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.
s/sve2p1/sve2/
Please!

@@ -26,4 +26,4 @@ void test_bfloat(const bfloat16_t *const_bf16_ptr, svbfloat16_t bf16, svbfloat16
// expected-error@+2 {{'svwhilewr_bf16' needs target feature (sve2,bf16)|(sme,bf16)}}
// overload-error@+1 {{'svwhilewr' needs target feature (sve2,bf16)|(sme,bf16)}}
SVE_ACLE_FUNC(svwhilewr,_bf16,,)(const_bf16_ptr, const_bf16_ptr);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change

Implement the new FEAT_SVE_B16B16 semantics and flag ('sve-b16b16') to
AArch64; add llvm/MC directive and clang/Sema tests to verify the use of
this flag; modiy the llvm/MC, Clang/CodeGen, Clang/Sema, and
llvm/CodeGen tests that are affected by the weakening of the soon-to-be
-deprecated 'b16b16' flag.

- Changes to Clang frontend
 - clang/include/clang/Basic/arm_sve.td
	- SVE single-vector intrinsics are guarded by +sve-b16b16+sve2 or
          +sve-b16b16+sme2 (non-streaming mode and streaming mode)
        - Z-targeting multi-vector instrinsics are gated on +sve-b16b16+sme2
          (svclamp_single_[x2/x4]) and are streaming mode only

- Changes to LLVM AArch64 backend
 - llvm/lib/Target/AArch64/AArch64Features.td
	- Create FeatureSVEB16B16
        - Change FeatureB16B16 to separate it from the new meaning of
	  FEAT_SVE_B16B16, it is now weakened to the role of the incoming
	  FEAT_SME_B16B16
 - llvm/lib/Target/AArch64/AArch64InstrInfo.td
	- Create the HasSVEB16B16 predicate, requires that the subtarget
	 has sve-b16b16 enabled
 - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
	- Predicate Z-targeting multi-vector BFloat to BFloat instructions
	  on HasSME2 and HasSVEB1616 (replace HasB16B16)
	- ZA-targeting instructions are separated and remain predicated
	  under HasB16B16 before it is replaced by incoming HasSMEB16B16
 - llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
	- Replace HasB16B16 with HasSVEB16B16 in predication of SVE2 single
	  vector BFloat to BFloat instructions
 - llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
	- Add the new sve-b16b16 flag mapping to FeatureSVEB16B16

- Changes to LLVM unit tests
 - llvm/unittests/TargetParser/TargetParserTest.cpp
	- Add new sve-b16b16 flag to existing target parser tests

- Added tests
 - clang/test/Sema/aarch64-sve2p1-intrinsics/acle_sve2p1_b16b16_streaming.cpp
	- To ensure that +sve-b16b16+sme2 enables SVE2 single vector BFloat
	to BFloat intrinsics in streaming mode
 - clang/test/Sema/aarch64-sve2p1-intrinsics/acle_sve2p1_b16b16.cpp
	- To ensure that +sve-b16b16+sve2 enables SVE2 single vector BFloat
	to BFloat intrinsics in non-streaming mode
 - llvm/test/MC/AArch64/SVE2p1
	- To ensure that SVE2 single vector BFloat to BFloat instructions are
	enabled by +sve-b16b16+sve2, and that this feature is removed by the
	prescence of +nosve-b16b16
 - llvm/test/MC/AArch64/SME2p1
	- To ensure that Z-targeting multi vector BFloat to BFloat instructions are
	enabled by +sve-b16b16+sme2, and that this feature is removed by the
	prescence of +nosve-b16b16

- Modified tests
	- All CodeGen, Semantic, and MC tests that are effected by the weakening
	 of +b16b16 have been modified to supply and/or expect +sve-b16b16.
- Predicate the SME2 Z-targeting b16-to-b16 intrinsics under FEAT_SME2,
  FEAT_SVE_B16B16.
- Add tests in ./clang/test/Sema/aarch64-sme2-intrinsics/acle_sme2_b16b16.cpp
  to verify this change.

- Combine streaming/non-streaming SVE mode tests into a single file,
  ./clang/test/Sema/aarch4-sve2p1-intrinsics/acle_sve2p1_b16b16.
- Delete ./clang/test/Sema/aarch64-sve2p1-intrinsics/acle_sve2p1_b16b16_streaming.cpp
  (redundant).
@SpencerAbson SpencerAbson merged commit a0ed7d6 into llvm:main Aug 7, 2024
7 checks passed
SpencerAbson added a commit to SpencerAbson/llvm-project that referenced this pull request Aug 8, 2024
Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME BFloat16
instructions. Remove the now redundant FEAT_B16B16, which has been replaced
by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see
llvm#101480 for the details and
reasoning of this change to LLVM.

- Changes to Clang AArch64 frontend
	- Change target guard of SME2 ZA-targeting non-widening BFloat16
	  intrinsics to 'sme-b16b16'

- Changes to LLVM AArch64 backend
  - llvm/lib/Target/AArch64/AArch64Features.td
	- Create FeatureSMEB16B16, which implies FeatureSME2 and
	  FeatureSVEB16B16
	- Remove FeatureB16B16
	- Fix description of FeatureSVEB16B16
  - llvm/lib/Target/AArch64/AArch64InstrInfo.td
	- Create HasSMEB16B16 predicate
  - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
	- Change predictication of SME2 ZA-targeting non-widening BFloat16
	  instructions to new HasSMEB16B16
  - llvm/lib/Target/AArch64/AArch64.td
	- Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies FEAT_SME2)
  - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp
	- Remove flag 'b16b16' mapping to removed FeatureB16B16
	- Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16

- Changes to LLVM unit tests
  - llvm/unittests/TargetParser/TargetParserTest.cpp
	- Add new sme-b16b16 flag to existing target parser tests
	- Add tests for the sme-b16b16 dependencies:
		- 'sme-b16b16' should enable 'sme2', 'sve-b16b16'.
        - Remove 'b16b16' from bf16 dependency test

- Added MC tests
    - llvm/test/MC/AArch64/SME2p1
	- To ensure that ZA-targeting multi-vector non-widening BFloat16 instructions
	are enabled by +sme-b16b16, and that this feature is removed by +nosme-b61b6.

- Modidified tests
    - All CodeGen, Semantic, and MC tests that are effected by the removal of 'b16b16',
    have been modified to supply and/or expect 'sme-b16b16' where appropriate.
TIFitis pushed a commit that referenced this pull request Aug 8, 2024
…6' flag (#101480)

This patch adds FeatureSVEB16B16 to the AArch64 backend in order to
represent the new behavior of FEAT_SVE_B16B16 (as described in the
latest [Armv9.4 extensions
documentation](https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extension?lang=en#md461-the-armv94-architecture-extension__FEAT_SVE_B16B16))
as well as a 'sve-b16b16' flag to enable it.

The predication of non-widening SVE BFloat16 instructions has changed to
require this feature, instead of the previously required and
soon-to-be-removed FeatureB16B16 which is enabled by the 'b16b16' flag.
Therefore, this change weakens the 'b16b16' flag in favour of
'sve-b16b16'. Existing tests that are effected by this have been
modified to use and/or expect 'sve-b16b16', and new tests have been
added to verify the behavior and implementation of 'sve-b16b16'.

This patch is in response to the response to the following changes.

The architecture features previously enabled by FEAT_SVE_B16B16 have
been relaxed such that it now implements:
      - With FEAT_SVE2 : SVE non-widening BFloat16 instructions in
Non-streaming SVE mode
      - With FEAT_SME2: SVE non-widening BFloat16 instructions when the
PE is in Streaming SVE mode and SME
        Z-targeting multi-vector non-widening BFloat16 instructions.
      - **It no longer implements** SME ZA-targeting non-widening
BFloat16 instructions.   

The SME ZA-targeting non-widening BFloat16 instructions are implemented
by the new FEAT_SME_B16B16, **this patch does not change how this
architecture feature is enabled** ('+b16b16+sme2'). Only those that are
implemented by FEAT_SVE_B16B16 have been changed to require 'sve-b16b16'
instead of 'b16b16'.

New flags must be created to represent FEAT_SVE_B16B16 and
FEAT_SME_B16B16:
      - 'sve-b16b16' enables the updated FEAT_SVE_B16B16 (described
here)
      - 'sme-b16b16' will enable the new FEAT_SME_B16B16
      - **This patch includes 'sve-b16b16' only**
   
A future patch will add 'sme-b16b16', SME ZA-targeting non-widening
BFloat16 instructions would then be guarded by '+sme-b16b16+sme2', and
'b16b16' can be removed.
SpencerAbson added a commit that referenced this pull request Aug 12, 2024
Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME
BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has
been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see
#101480 for the details and
reasoning of this change to LLVM.

FEAT_SME_B16B16 is documented under the latest Armv9.4 feature
documentation:

https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio

- Changes to Clang AArch64 frontend
- Change target guard of SME2 ZA-targeting non-widening BFloat16
intrinsics to 'sme-b16b16'

- Changes to LLVM AArch64 backend
  - llvm/lib/Target/AArch64/AArch64Features.td
- Create FeatureSMEB16B16, which implies FeatureSME2 and
FeatureSVEB16B16
	- Remove FeatureB16B16
	- Fix description of FeatureSVEB16B16
  - llvm/lib/Target/AArch64/AArch64InstrInfo.td
	- Create HasSMEB16B16 predicate
  - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
- Change predictication of SME2 ZA-targeting non-widening BFloat16
instructions to new HasSMEB16B16
  - llvm/lib/Target/AArch64/AArch64.td
- Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies
FEAT_SME2)
  - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp
	- Remove flag 'b16b16' mapping to removed FeatureB16B16
	- Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16

- Changes to LLVM unit tests
  - llvm/unittests/TargetParser/TargetParserTest.cpp
	- Add new sme-b16b16 flag to existing target parser tests
	- Add tests for the sme-b16b16 dependencies:
- 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16'
from bf16 dependency test

- Added MC tests
    - llvm/test/MC/AArch64/SME2p1
- To ensure that ZA-targeting multi-vector non-widening BFloat16
instructions are enabled by +sme-b16b16, and that this feature is
removed by +nosme-b61b6.

- Modidified tests
- All CodeGen, Semantic, and MC tests that are effected by the removal
of 'b16b16', have been modified to supply and/or expect 'sme-b16b16'
where appropriate.
bwendling pushed a commit to bwendling/llvm-project that referenced this pull request Aug 15, 2024
Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME
BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has
been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see
llvm#101480 for the details and
reasoning of this change to LLVM.

FEAT_SME_B16B16 is documented under the latest Armv9.4 feature
documentation:

https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio

- Changes to Clang AArch64 frontend
- Change target guard of SME2 ZA-targeting non-widening BFloat16
intrinsics to 'sme-b16b16'

- Changes to LLVM AArch64 backend
  - llvm/lib/Target/AArch64/AArch64Features.td
- Create FeatureSMEB16B16, which implies FeatureSME2 and
FeatureSVEB16B16
	- Remove FeatureB16B16
	- Fix description of FeatureSVEB16B16
  - llvm/lib/Target/AArch64/AArch64InstrInfo.td
	- Create HasSMEB16B16 predicate
  - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
- Change predictication of SME2 ZA-targeting non-widening BFloat16
instructions to new HasSMEB16B16
  - llvm/lib/Target/AArch64/AArch64.td
- Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies
FEAT_SME2)
  - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp
	- Remove flag 'b16b16' mapping to removed FeatureB16B16
	- Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16

- Changes to LLVM unit tests
  - llvm/unittests/TargetParser/TargetParserTest.cpp
	- Add new sme-b16b16 flag to existing target parser tests
	- Add tests for the sme-b16b16 dependencies:
- 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16'
from bf16 dependency test

- Added MC tests
    - llvm/test/MC/AArch64/SME2p1
- To ensure that ZA-targeting multi-vector non-widening BFloat16
instructions are enabled by +sme-b16b16, and that this feature is
removed by +nosme-b61b6.

- Modidified tests
- All CodeGen, Semantic, and MC tests that are effected by the removal
of 'b16b16', have been modified to supply and/or expect 'sme-b16b16'
where appropriate.
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Aug 15, 2024
…6' flag (llvm#101480)

This patch adds FeatureSVEB16B16 to the AArch64 backend in order to
represent the new behavior of FEAT_SVE_B16B16 (as described in the
latest [Armv9.4 extensions
documentation](https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extension?lang=en#md461-the-armv94-architecture-extension__FEAT_SVE_B16B16))
as well as a 'sve-b16b16' flag to enable it.

The predication of non-widening SVE BFloat16 instructions has changed to
require this feature, instead of the previously required and
soon-to-be-removed FeatureB16B16 which is enabled by the 'b16b16' flag.
Therefore, this change weakens the 'b16b16' flag in favour of
'sve-b16b16'. Existing tests that are effected by this have been
modified to use and/or expect 'sve-b16b16', and new tests have been
added to verify the behavior and implementation of 'sve-b16b16'.

This patch is in response to the response to the following changes.

The architecture features previously enabled by FEAT_SVE_B16B16 have
been relaxed such that it now implements:
      - With FEAT_SVE2 : SVE non-widening BFloat16 instructions in
Non-streaming SVE mode
      - With FEAT_SME2: SVE non-widening BFloat16 instructions when the
PE is in Streaming SVE mode and SME
        Z-targeting multi-vector non-widening BFloat16 instructions.
      - **It no longer implements** SME ZA-targeting non-widening
BFloat16 instructions.   

The SME ZA-targeting non-widening BFloat16 instructions are implemented
by the new FEAT_SME_B16B16, **this patch does not change how this
architecture feature is enabled** ('+b16b16+sme2'). Only those that are
implemented by FEAT_SVE_B16B16 have been changed to require 'sve-b16b16'
instead of 'b16b16'.

New flags must be created to represent FEAT_SVE_B16B16 and
FEAT_SME_B16B16:
      - 'sve-b16b16' enables the updated FEAT_SVE_B16B16 (described
here)
      - 'sme-b16b16' will enable the new FEAT_SME_B16B16
      - **This patch includes 'sve-b16b16' only**
   
A future patch will add 'sme-b16b16', SME ZA-targeting non-widening
BFloat16 instructions would then be guarded by '+sme-b16b16+sme2', and
'b16b16' can be removed.
SpencerAbson added a commit to SpencerAbson/llvm-project that referenced this pull request Aug 16, 2024
The enablement of SVE/SME non-widening BFloat16 instructions was recently
changed in response to an architecture update, in which:
	- FEAT_SVE_B16B16 was weakened
	- FEAT_SME_B16B16 was introduced
New flags, 'sve-b16b16' and 'sme-b16b16' were introduced to replace the
existing 'b16b16'. This was acheived in the below two patches.
	- llvm#101480
	- llvm#102501
Ideally, the interface change introduced here will be valid in LLVM-19.
We do not see it necessary to back-port the entire change, but just to add
'sme-b16b16' and 'sve-b16b16' as aliases to the existing (and unchanged)
'b16b16' and 'sme2' flags which together cover all of these features.

The predication of Bf16 variants of svmin/svminnm and svmax/svmaxnm is also
fixed in this change.
tru pushed a commit to SpencerAbson/llvm-project that referenced this pull request Aug 19, 2024
The enablement of SVE/SME non-widening BFloat16 instructions was recently
changed in response to an architecture update, in which:
	- FEAT_SVE_B16B16 was weakened
	- FEAT_SME_B16B16 was introduced
New flags, 'sve-b16b16' and 'sme-b16b16' were introduced to replace the
existing 'b16b16'. This was acheived in the below two patches.
	- llvm#101480
	- llvm#102501
Ideally, the interface change introduced here will be valid in LLVM-19.
We do not see it necessary to back-port the entire change, but just to add
'sme-b16b16' and 'sve-b16b16' as aliases to the existing (and unchanged)
'b16b16' and 'sme2' flags which together cover all of these features.

The predication of Bf16 variants of svmin/svminnm and svmax/svmaxnm is also
fixed in this change.
@vfdff
Copy link
Contributor

vfdff commented Aug 21, 2024

hi, SpencerAbson
How to enable the behavior of FEAT_SVE_B16B16 after this commit ? I try clang -march=armv8.5a+sve2+sve-b16b16, but it seems doesn't take effect. https://gcc.godbolt.org/z/sadrshqvM

Or the only method to effect is -march=armv8.5a+sve2+bf16 -O3 -Xclang -target-feature -Xclang +sve-b16b16 ?

@SpencerAbson
Copy link
Contributor Author

SpencerAbson commented Aug 21, 2024

Hi @vfdff

You were correct the first time, -march=armv8.5a+sve2+sve-b16b16 will enable the instructions (and intrinsics) under FEAT_SVE_B16B16 for SVE.

However, please refer to the ACLE concerning the use of the __bf16 type.

The __bf16 type is only available when the __ARM_BF16_FORMAT_ALTERNATIVE feature macro is defined. When it is available it can only be used by the ACLE intrinsics ; it cannot be used with standard C operators. It is expected that arithmetic using standard C operators be used using a single-precision floating point format and the value be converted to __bf16 when required using ACLE intrinsics.

Please bare in mind that the B16B16 support is in alpha state, and may be changed in the future.

Thanks

PhilippvK added a commit to PhilippvK/CoreDSL2LLVM that referenced this pull request Sep 21, 2024
commit a4bf6cd7cfb1a1421ba92bca9d017b49936c55e4
Author: Tobias Hieta <tobias@hieta.se>
Date:   Tue Sep 17 13:26:36 2024 +0200

    Bump version to 19.1.0 (final)

commit 560ed047d183348b341ffd4e27712c254d82f589
Author: Tobias Hieta <tobias@hieta.se>
Date:   Tue Sep 17 09:39:18 2024 +0200

    Revert " [LoongArch][ISel] Check the number of sign bits in `PatGprGpr_32` (#107432)"

    This reverts commit 78654faa0c6d9dc2f72b81953b9cffbb7675755b.

commit bd4ff65a601895ba816623cddb36ce466cceabe6
Author: Tobias Hieta <tobias@hieta.se>
Date:   Tue Sep 17 09:39:01 2024 +0200

    Revert "[LoongArch] Eliminate the redundant sign extension of division (#107971)"

    This reverts commit d752f29fb333d47724484e08b32d6499cc1e460e.

commit bdae3c487cbb2b4161e7fbb54a855f0ba55da61a
Author: Zaara Syeda <syzaara@ca.ibm.com>
Date:   Tue Sep 10 14:14:01 2024 -0400

    [PowerPC] Fix assert exposed by PR 95931 in LowerBITCAST (#108062)

    Hit Assertion failed: Num < NumOperands && "Invalid child # of SDNode!"
    Fix by checking opcode and value type before calling getOperand.

    (cherry picked from commit 22067a8eb43a7194e65913b47a9c724fde3ed68f)

commit 149a150b50c112e26fc5acbdd58250c44ccd777f
Author: Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
Date:   Mon Sep 16 11:16:14 2024 +0000

    [X86] AMD Zen 5 Initial enablement

commit 82e85b62da3f62759ab94aecd0ebac61f3856719
Author: Brian Cain <bcain@quicinc.com>
Date:   Fri Sep 13 17:10:03 2024 -0500

    [lld] select a default eflags for hexagon (#108431)

    Empty archives are apparently routine in linux kernel builds, so instead
    of asserting, we should handle this case with a sane default value.

    (cherry picked from commit d1ba432533aafc52fc59158350af937a8b6b9538)

commit 82f3a4a32d2500ab1e6c51e0d749ffbac9afb1fa
Author: Konstantin Varlamov <varconsteq@gmail.com>
Date:   Fri Sep 13 01:26:57 2024 -0700

    Guard an include of `<ostream>` in `<chrono>` with availability macro (#108429)

    This fixes a regression introduced in
    https://github.com/llvm/llvm-project/pull/96035.

    (cherry picked from commit 127c34948bd54e92ef2ee544e8bc42acecf321ad)

commit a847b66a750291f8b63c03b9f355c6f4d09cdfe3
Author: Jonathon Penix <jpenix@quicinc.com>
Date:   Wed Sep 11 09:53:11 2024 -0700

    [RISCV] Don't outline pcrel_lo when the function has a section prefix (#107943)

    GNU ld will error when encountering a pcrel_lo whose corresponding
    pcrel_hi is in a different section. [1] introduced a check to help
    prevent this issue by preventing outlining in a few circumstances.
    However, we can also hit this same issue when outlining from functions
    with prefixes ("hot"/"unlikely"/"unknown" from profile information, for
    example) as the outlined function might not have the same prefix,
    possibly resulting in a "paired" pcrel_lo and pcrel_hi ending up in
    different sections.

    To prevent this issue, take a similar approach as [1] and additionally
    prevent outlining when we see a pcrel_lo and the function has a prefix.

    [1]
    https://github.com/llvm/llvm-project/commit/96c85f80f0d615ffde0f85d8270e0a8c9f4e5430

    Fixes #107520

    (cherry picked from commit 866b93e6b33fac9a4bc62bbc32199bd98f434784)

commit 6278084bc69a427cf7a610076817c420e3dc8594
Author: Nikolas Klauser <nikolasklauser@berlin.de>
Date:   Wed Sep 11 08:47:24 2024 +0200

    [Clang] Fix crash due to invalid source location in __is_trivially_equality_comparable (#107815)

    Fixes #107777

    (cherry picked from commit 6dbdb8430b492959c399a7809247424c6962902f)

commit d752f29fb333d47724484e08b32d6499cc1e460e
Author: hev <wangrui@loongson.cn>
Date:   Tue Sep 10 16:52:21 2024 +0800

    [LoongArch] Eliminate the redundant sign extension of division (#107971)

    If all incoming values of `div.d` are sign-extended and all users only
    use the lower 32 bits, then convert them to W versions.

    Fixes: #107946
    (cherry picked from commit 0f47e3aebdd2a4a938468a272ea4224552dbf176)

commit 78654faa0c6d9dc2f72b81953b9cffbb7675755b
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Tue Sep 10 09:19:39 2024 +0800

     [LoongArch][ISel] Check the number of sign bits in `PatGprGpr_32` (#107432)

    After https://github.com/llvm/llvm-project/pull/92205, LoongArch ISel
    selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to
    i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit
    constant. It will produce wrong result when `X == 2`:
    https://alive2.llvm.org/ce/z/pzfGZZ

    This patch adds additional `sexti32` checks to operands of
    `PatGprGpr_32`.
    Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp

    Fix #107414.

    (cherry picked from commit a111f9119a5ec77c19a514ec09454218f739454f)

commit f0010d131b79a1b401777aa32e96defc4a935c9d
Author: R-Goc <131907007+R-Goc@users.noreply.github.com>
Date:   Wed Sep 4 20:10:36 2024 +0200

    [Windows SEH] Fix crash on empty seh block (#107031)

    Fixes https://github.com/llvm/llvm-project/issues/105813 and
    https://github.com/llvm/llvm-project/issues/106915.
    Adds a check for the end of the iterator, which can be a sentinel.
    The issue was introduced in
    https://github.com/llvm/llvm-project/commit/0efe111365ae176671e01252d24028047d807a84
    from what I can see, so along with the introduction of /EHa support.

    (cherry picked from commit 2e0ded3371f8d42f376bdfd4d70687537e36818e)

commit 93998aff7662d9b3f94d9627179dffe342e2b399
Author: Jay Foad <jay.foad@amd.com>
Date:   Tue Aug 27 17:09:40 2024 +0100

    [AMDGPU] Fix sign confusion in performMulLoHiCombine (#105831)

    SMUL_LOHI and UMUL_LOHI are different operations because the high part
    of the result is different, so it is not OK to optimize the signed
    version to MUL_U24/MULHI_U24 or the unsigned version to
    MUL_I24/MULHI_I24.

commit 373180b440d04dc3cc0f6111b06684d18779d7c8
Author: Alexey Bataev <a.bataev@outlook.com>
Date:   Thu Aug 15 07:21:10 2024 -0700

    [SLP]Fix PR104422: Wrong value truncation

    The minbitwidth restrictions can be skipped only for immediate reduced
    values, for other nodes still need to check if external users allow
    bitwidth reduction.

    Fixes https://github.com/llvm/llvm-project/issues/104422

    (cherry picked from commit 56140a8258a3498cfcd9f0f05c182457d43cbfd2)

commit 32a8b56bbf0a3c7678d44ba690427915446a9a72
Author: Tom Stellard <tstellar@redhat.com>
Date:   Thu Sep 12 09:50:57 2024 -0700

    workflows/release-binaries: Fix automatic upload (#107315)

    (cherry picked from commit ab96409180aaad5417030f06a386253722a99d71)

commit 8290ce0998788b6a575ed7b4988b093f48c25b3d
Author: cor3ntin <corentinjabot@gmail.com>
Date:   Tue Sep 3 20:36:15 2024 +0200

    [Clang] Fix handling of placeholder variables name in init captures (#107055)

    We were incorrectly not deduplicating results when looking up `_` which,
    for a lambda init capture, would result in an ambiguous lookup.

    The same bug caused some diagnostic notes to be emitted twice.

    Fixes #107024

commit 327ca6c02f0dbf13dd6f039d30d320a7ba1456b8
Author: Owen Pan <owenpiano@gmail.com>
Date:   Thu Sep 5 23:59:11 2024 -0700

    [clang-format] Correctly annotate braces in macro definition (#107352)

    This reverts commit 2d90e8f7402b0a8114978b6f014cfe76c96c94a1 and backports
    commit 616a8ce6203d8c7569266bfaf163e74df1f440ad.

commit 2651d09ec9c4d87d09ae72d8bf42fab566fb02d0
Author: Hua Tian <akiratian@tencent.com>
Date:   Thu Aug 15 19:03:27 2024 +0800

    [llvm][CodeGen] Resolve issues when updating live intervals in window scheduler (#101945)

    Corrupted live interval information can cause window scheduling to crash
    in some cases. By adding the missing MBB's live interval information in the
    ModuloScheduleExpander, the information can be correctly analyzed in
    the window scheduler.

    (cherry picked from commit 43ba1097ee747b4ec5e757762ed0c9df6255a292)

commit f64404e32187a6f45771e72e1b65e99be82acaba
Author: Rainer Orth <ro@gcc.gnu.org>
Date:   Sat Aug 3 22:18:11 2024 +0200

    [builtins] Fix divtc3.c etc. compilation on Solaris/SPARC with gcc (#101662)

    `compiler-rt/lib/builtins/divtc3.c` and `multc3.c` don't compile on
    Solaris/sparcv9 with `gcc -m32`:
    ```
    FAILED: projects/compiler-rt/lib/builtins/CMakeFiles/clang_rt.builtins-sparc.dir/divtc3.c.o
    [...]
    compiler-rt/lib/builtins/divtc3.c: In function ‘__divtc3’:
    compiler-rt/lib/builtins/divtc3.c:22:18: error: implicit declaration of function ‘__compiler_rt_logbtf’ [-Wimplicit-function-declaration]
       22 |   fp_t __logbw = __compiler_rt_logbtf(
          |                  ^~~~~~~~~~~~~~~~~~~~
    ```
    and many more. It turns out that while the definition of `__divtc3` is
    guarded with `CRT_HAS_F128`, the `__compiler_rt_logbtf` and other
    declarations use `CRT_HAS_128BIT && CRT_HAS_F128` as guard. This only
    shows up with `gcc` since, as documented in Issue #41838, `clang`
    violates the SPARC psABI in not using 128-bit `long double`, so this
    code path isn't used.

    Fixed by changing the guards to match.

    Tested on `sparcv9-sun-solaris2.11`.

    (cherry picked from commit 63a7786111c501920afc4cc27a4633f76cdaf803)

commit bb79e7f668456473e13985a8f135cc3a45340fb5
Author: Nicolas van Kempen <nvankemp@gmail.com>
Date:   Mon Sep 9 07:12:46 2024 -0400

    [clang][analyzer] Fix #embed crash (#107764)

    Fix #107724.

    (cherry picked from commit d84d9559bdc7aeb4ce14c251f6a3490c66db8d3a)

commit 5e1a55eaa0bb592dd04f1b8474b8f064aded7b2e
Author: Sander de Smalen <sander.desmalen@arm.com>
Date:   Thu Sep 5 15:06:19 2024 +0100

    [AArch64] Disable SVE paired ld1/st1 for callee-saves.

    The functionality to make use of SVE's load/store pair instructions for
    the callee-saves is broken because the offsets used in the instructions
    are incorrect.

    This is addressed by #105518 but given the complexity of this code
    and the subtleties around calculating the right offsets, we favour
    disabling the behaviour altogether for LLVM 19.

    This fix is critical for any programs being compiled with `+sme2`.

commit 42f18eedc2cf2d1f64fd5d78fda376adf39a9b3d
Author: Alexey Bataev <a.bataev@outlook.com>
Date:   Tue Sep 3 04:52:47 2024 -0700

    [SLP]Fix PR107036: Check if the type of the user is sizable before requesting its size.

    Only some instructions should be considered as potentially reducing the
    size of the operands types, not all instructions should be considered.

    Fixes https://github.com/llvm/llvm-project/issues/107036

    (cherry picked from commit f381cd069965dabfeb277f30a4e532d7fd498f6e)

commit 11e2a1552f92ccb080d08083ceb71f7e6ed4db78
Author: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
Date:   Thu Aug 29 14:12:02 2024 +0100

    [RemoveDIs] Fix spliceDebugInfo splice-to-end edge case (#105671)

    Fix #105571 which demonstrates an end() iterator dereference when
    performing a non-empty splice to end() from a region that ends at
    Src::end().

    Rather than calling Instruction::adoptDbgRecords from Dest, create a marker
    (which takes an iterator) and absorbDebugValues onto that. The "absorb" variant
    doesn't clean up the source marker, which in this case we know is a trailing
    marker, so we have to do that manually.

    (cherry picked from commit 43661a1214353ea1773a711f403f8d1118e9ca0f)

commit 64015eee93062b34df290338c45e87868fa750a9
Author: Hans Wennborg <hans@chromium.org>
Date:   Mon Sep 9 10:56:37 2024 +0200

    Release note about targets built in the Windows packages

    LLVM_TARGETS_TO_BUILD was set in #106059

commit 52e5a72e9200667e8a62436268fdaff4411f7216
Author: Sander de Smalen <sander.desmalen@arm.com>
Date:   Thu Sep 5 17:54:57 2024 +0100

    [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396)

    This removes a redundant 'COPY' instruction that #81716 probably forgot
    to remove.

    This redundant COPY led to an issue because because code in
    LiveRangeSplitting expects that the instruction emitted by
    `loadRegFromStackSlot` is an instruction that accesses memory, which
    isn't the case for the COPY instruction.

    (cherry picked from commit 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c)

commit 5cf78453b3de39247364ddf97b1c18c011283948
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Wed Sep 4 13:36:32 2024 +0800

    [Clang][CodeGen] Don't emit assumptions if current block is unreachable. (#106936)

    Fixes https://github.com/llvm/llvm-project/issues/106898.

    When emitting an infinite loop, clang codegen will delete the whole
    block and leave builder's current block as nullptr:

    https://github.com/llvm/llvm-project/blob/837ee5b46a5f7f898f0de7e46a19600b896a0a1f/clang/lib/CodeGen/CGStmt.cpp#L597-L600

    Then clang will create `zext (icmp slt %a, %b)` without parent block for
    `a < b`. It will crash here:

    https://github.com/llvm/llvm-project/blob/837ee5b46a5f7f898f0de7e46a19600b896a0a1f/clang/lib/CodeGen/CGExprScalar.cpp#L416-L420

    Even if we disabled this optimization, it still crashes in
    `Builder.CreateAssumption`:

    https://github.com/llvm/llvm-project/blob/837ee5b46a5f7f898f0de7e46a19600b896a0a1f/llvm/lib/IR/IRBuilder.cpp#L551-L561

    This patch disables assumptions emission if current block is null.

    (cherry picked from commit c94bd96c277e0b48e198fdc831bb576d9a04aced)

commit 82a11e46ce87ea570358e4c25ee445929402a490
Author: cor3ntin <corentinjabot@gmail.com>
Date:   Wed Sep 4 10:02:55 2024 +0200

    [Clang] Workaround dependent source location issues (#106925)

    In #78436 we made some SourceLocExpr dependent to
    deal with the fact that their value should reflect the name of
    specialized function - rather than the rtemplate in which they are first
    used.

    However SourceLocExpr are unusual in two ways
     - They don't depend on template arguments
    - They morally depend on the context in which they are used (rather than
    called from).

    It's fair to say that this is quite novels and confuses clang. In
    particular, in some cases, we used to create dependent SourceLocExpr and
    never subsequently transform them, leaving dependent objects in
    instantiated functions types. To work around that we avoid replacing
    SourceLocExpr when we think they could remain dependent.
    It's certainly not perfect but it fixes a number of reported bugs, and
    seem to only affect scenarios in which the value of the SourceLocExpr
    does not matter (overload resolution).

    Fixes #106428
    Fixes #81155
    Fixes #80210
    Fixes #85373

    ---------

    Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

commit e657e0256509f6f665917904078a5389684fc716
Author: Tom Stellard <tstellar@redhat.com>
Date:   Fri Jul 26 07:38:53 2024 -0700

    workflows: Fix tag name for release sources job (#100752)

    (cherry picked from commit 3c2ce7088886a22ab8dc0e9488600c43644b5102)

commit 8664666823b3eb8d96fde58f79d71d36bd7f9115
Author: Eli Friedman <efriedma@quicinc.com>
Date:   Thu Aug 1 16:18:20 2024 -0700

    Fix codegen of consteval functions returning an empty class, and related issues (#93115)

    Fix codegen of consteval functions returning an empty class, and related
    issues

    If a class is empty, don't store it to memory: the store might overwrite
    useful data. Similarly, if a class has tail padding that might overlap
    other fields, don't store the tail padding to memory.

    The problem here turned out a bit more general than I initially thought:
    basically all uses of EmitAggregateStore were broken. Call lowering had
    a method that did mostly the right thing, though: CreateCoercedStore.
    Adapt CreateCoercedStore so it always does the conservatively right
    thing, and use it for both calls and ConstantExpr.

    Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was
    set incorrectly for empty classes in some cases.

    Fixes #93040.

    (cherry picked from commit 1762e01cca0186f1862db561cfd9019164b8c654)

commit 0c641568515a797473394694f05937e1f1913d87
Author: Tobias Hieta <tobias@hieta.se>
Date:   Tue Sep 3 16:09:11 2024 +0200

    Bump version to 19.1.0-rc4

commit a01d631a1c2c3902b383b6491f27b72d63f6257b
Author: Patryk Wychowaniec <pwychowaniec@pm.me>
Date:   Fri Aug 30 16:50:56 2024 +0200

    [AVR] Fix LLD test (#106739)

    Since we don't generate relocations for those, it doesn't make sense to
    assert them here; fallout of
    https://github.com/llvm/llvm-project/pull/106722.

    (cherry picked from commit a3816b5a573dbf57ba3082a919ca2de6b47257e9)

commit 830b7ebac09ebef91671f0863986aee1a1d60e5e
Author: Patryk Wychowaniec <pwychowaniec@pm.me>
Date:   Fri Aug 30 15:25:54 2024 +0200

    [AVR] Fix parsing & emitting relative jumps (#106722)

    Ever since 6859685a87ad093d60c8bed60b116143c0a684c7 (or, precisely,
    84428dafc0941e3a31303fa1b286835ab2b8e234) relative jumps emitted by the
    AVR codegen are off by two bytes - this pull request fixes it.

    ## Abstract

    As compared to absolute jumps, relative jumps - such as rjmp, rcall or
    brsh - have an implied `pc+2` behavior; that is, `jmp 100` is `pc =
    100`, but `rjmp 100` gets understood as `pc = pc + 100 + 2`.

    This is not reflected in the AVR codegen:

    https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L89

    ... which always emits relative jumps that are two bytes too far - or
    rather it _would_ emit such jumps if not for this check:

    https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L517

    ... which causes most of the relative jumps to be actually resolved
    late, by the linker, which applies the offsetting logic on its own,
    hiding the issue within LLVM.

    [Some time
    ago](https://github.com/llvm/llvm-project/commit/697a162fa63df328ec9ca334636c5e85390b2bf0)
    we've had a similar "jumps are off" problem that got solved by touching
    `shouldForceRelocation()`, but I think that has worked only by accident.
    It's exploited the fact that absolute vs relative jumps in the parsed
    assembly can be distinguished through a "side channel" check relying on
    the existence of labels (i.e. absolute jumps happen to named labels, but
    relative jumps are anonymous, so to say). This was an alright idea back
    then, but it got broken by 6859685a87ad093d60c8bed60b116143c0a684c7.

    I propose a different approach:
    - when emitting relative jumps, offset them by `-2` (well, `-1`,
    strictly speaking, because those instructions rely on right-shifted
    offset),
    - when parsing relative jumps, treat `.` as `+2` and read `rjmp .+1234`
    as `rjmp (1234 + 2)`.

    This approach seems to be sound and now we generate the same assembly as
    avr-gcc, which can be confirmed with:

    ```cpp
    // avr-gcc test.c -O3 && avr-objdump -d a.out

    int main() {
        asm(
    "      foo:\n\t"
    "        rjmp  .+2\n\t"
    "        rjmp  .-2\n\t"
    "        rjmp  foo\n\t"
    "        rjmp  .+8\n\t"
    "        rjmp  end\n\t"
    "        rjmp  .+0\n\t"
    "      end:\n\t"
    "        rjmp .-4\n\t"
    "        rjmp .-6\n\t"
    "      x:\n\t"
    "        rjmp x\n\t"
    "        .short 0xc00f\n\t"
    );
    }
    ```

    avr-gcc is also how I got the opcodes for all new tests like `inst-brbc.s`, so we should be good.

    (cherry picked from commit 86a60e7f1e8f361f84ccb6e656e848dd4fbaa713)

commit f3da9af3fd2696fbbe437dea599eda088fcb5592
Author: Jeremy Morse <jeremy.morse@sony.com>
Date:   Mon Sep 2 11:56:40 2024 +0100

    [DebugInfo][RemoveDIs] Find types hidden in DbgRecords (#106547)

    When serialising to textual IR, there can be constant Values referred to
    by DbgRecords that don't appear anywhere else, and have types hidden
    even deeper in side them. Enumerate these when enumerating all types.

    Test by Mikael Holmén.

    (cherry picked from commit 25f87f2d703178bb4bc13a62cb3df001b186cba2)

commit 2d90e8f7402b0a8114978b6f014cfe76c96c94a1
Author: Owen Pan <owenpiano@gmail.com>
Date:   Mon Sep 2 01:40:13 2024 -0700

    [clang-format] Correctly annotate braces in macro definition (#106662)

    Fixes #106418.

    (cherry picked from commit 0fa78b6c7bd43c2498700a98c47a02cf4fd06388)

commit e594b284810c73b09da9436fdc6f1cbbfb4a7924
Author: Nikita Popov <npopov@redhat.com>
Date:   Wed Aug 28 12:54:14 2024 +0200

    [IndVars] Check if WideInc available before trying to use it

    WideInc/WideIncExpr can be null. Previously this worked out
    because the comparison with WideIncExpr would fail. Now we have
    accesses to WideInc prior to that. Avoid the issue with an
    explicit check.

    Fixes https://github.com/llvm/llvm-project/issues/106239.

    (cherry picked from commit c9a5e1b665dbba898e9981fd7d48881947e6560e)

commit e3abd19242dd908e6186639d091f6ecc219963f0
Author: Martin Storsjö <martin@martin.st>
Date:   Thu Aug 8 13:51:07 2024 +0300

    [compiler-rt] Support building runtimes for Windows on arm32 (#101462)

    In these environments, the architecture name is armv7; recognize that
    and enable the relevant runtimes.

    Fix building the sanitizer_common library for this target, by using the
    right registers for the architecture - this is similar to what
    0c391133c9201ef29273554a1505ef855ce17668 did for aarch64.

    (Still, address sanitizer doesn't support hooking functions at runtime
    on armv7 or aarch64 - but other runtimes such as ubsan do work.)

    (cherry picked from commit 5ea9dd8c7076270695a1d90b9c73718e7d95e0bf)

commit 9b6180ed2ecbbb54f26caa78082e7b955a634117
Author: kadir çetinkaya <kadircet@google.com>
Date:   Mon Sep 2 15:25:26 2024 +0200

    [clangd] Update TidyFastChecks for release/19.x (#106354)

    Run for clang-tidy checks available in release/19.x branch.

    Some notable findings:
    - altera-id-dependent-backward-branch, stays slow with 13%.
    - misc-const-correctness become faster, going from 261% to 67%, but
    still above
      8% threshold.
    - misc-header-include-cycle is a new SLOW check with 10% runtime
    implications
    - readability-container-size-empty went from 16% to 13%, still SLOW.

    (cherry picked from commit b47d7ce8121b1cb1923e879d58eaa1d63aeaaae2)

commit d9cb501ec0012de5d4e1c6310df55f4e8af011a9
Author: Hans <hans@hanshq.net>
Date:   Mon Sep 2 15:04:13 2024 +0200

    Win release packaging: Don't try to use rpmalloc for 32-bit x86 (#106969)

    because that doesn't work (results in `LINK : error LNK2001: unresolved
    external symbol malloc`).
    Based on the title of #91862 it was only intended for use in 64-bit
    builds.

    (cherry picked from commit ef26afcb88dcb5f2de79bfc3cf88a8ea10f230ec)

commit 95fa0bee9314a878b3a58d748998c3b3ef42bd75
Author: Owen Pan <owenpiano@gmail.com>
Date:   Thu Aug 29 19:14:19 2024 -0700

    [clang-format] Correctly identify token-pasted record names (#106484)

    See
    https://github.com/llvm/llvm-project/pull/89706#issuecomment-2315549955.

    (cherry picked from commit 7579787e05966f21684dd4b4a15b9deac13d09e1)

commit 6d7e428df611861fb1f5151dea938ebfcc7b1363
Author: OverMighty <its.overmighty@gmail.com>
Date:   Fri Aug 30 12:59:05 2024 +0200

    [builtins] Fix missing main() function in float16/bfloat16 support checks (#104478)

    The CMake docs state that `check_c_source_compiles()` checks whether the
    supplied code "can be compiled as a C source file and linked as an
    executable (so it must contain at least a `main()` function)."

    https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html

    In practice, this command is a wrapper around `try_compile()`:

    - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/CheckCSourceCompiles.cmake#L54
    - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/Internal/CheckSourceCompiles.cmake#L101

    When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/,
    `CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `STATIC_LIBRARY`, so the
    checks for `float16` and `bfloat16` support work as intended in a
    Clang + compiler-rt runtime build for example, as it runs CMake
    recursively from that directory.

    However, when using llvm/ or compiler-rt/ as CMake source directory, as
    `CMAKE_TRY_COMPILE_TARGET_TYPE` defaults to `EXECUTABLE`, these checks
    will indeed fail if the code doesn't have a `main()` function. This
    results in LLVM using x86 SIMD registers when generating calls to
    builtins that, with Arch Linux's compiler-rt package for example,
    actually use a GPR for their argument or return value as they use
    `uint16_t` instead of `_Float16`.

    This had been caught in post-commit review:
    https://reviews.llvm.org/D145237#4521152. Use of the internal
    `CMAKE_C_COMPILER_WORKS` variable is not what hides the issue, however.

    PR #69842 tried to fix this by unconditionally setting
    `CMAKE_TRY_COMPILE_TARGET_TYPE` to `STATIC_LIBRARY`, but it apparently
    caused other issues, so it was reverted. This PR just adds a `main()`
    function in the checks, as per the CMake docs.

    (cherry picked from commit 68d8b3846ab1e6550910f2a9a685690eee558af2)

commit f131edf6fbe8e2ab7306aba72698daa6153ec91e
Author: Avi Kivity <avi@scylladb.com>
Date:   Mon Aug 26 17:56:45 2024 +0300

    [Instrumentation] Fix EdgeCounts vector size in SetBranchWeights (#99064)

    (cherry picked from commit 46a4132e167aa44d8ec7776262ce2a0e6d47de59)

commit 1ccd19c4b297e9a2bd1b2bb6bbb9d9ad2acbab40
Author: Owen Pan <owenpiano@gmail.com>
Date:   Fri Aug 30 19:23:45 2024 -0700

    [clang-format] Correctly annotate braces in ObjC square brackets (#106654)

    See
    https://github.com/llvm/llvm-project/pull/88238#issuecomment-2316954781.

    (cherry picked from commit e0f2368cdeb7312973a92fb2d22199d1de540db8)

commit 6f623478d48c171d59e95b25ea2aca49dca8f135
Author: Ties Stuij <ties.stuij@arm.com>
Date:   Tue Jul 23 14:09:34 2024 +0100

    [libcxx] don't `#include <cwchar>` if wide chars aren't enabled (#99911)

    Pull request #96032 unconditionall adds the `cwchar` include in the
    `format` umbrella header. However support for wchar_t can be disabled in
    the build system (LIBCXX_ENABLE_WIDE_CHARACTERS).

    This patch guards against inclusion of `cwchar` in `format` by checking
    the `_LIBCPP_HAS_NO_WIDE_CHARACTERS` define.

    For clarity I've also merged the include header section that `cwchar`
    was in with the one above as they were both guarded by the same `#if`
    logic.

    (cherry picked from commit ec56790c3b27df4fa1513594ca9a74fd8ad5bf7f)

commit e1be8cf8723e8577abaeef586ec4c39f30053913
Author: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
Date:   Wed Aug 28 14:20:33 2024 +0100

    [RemoveDIs] Simplify spliceDebugInfo, fixing splice-to-end edge case (#105670)

    Not quite NFC, fixes splitBasicBlockBefore case when we split before an
    instruction with debug records (but without the headBit set, i.e., we are
    splitting before the instruction but after the debug records that come before
    it). splitBasicBlockBefore splices the instructions before the split point into
    a new block. Prior to this patch, the debug records would get shifted up to the
    front of the spliced instructions (as seen in the modified unittest - I believe
    the unittest was checking erroneous behaviour). We instead want to leave those
    debug records at the end of the spliced instructions.

    The functionality of the deleted `else if` branch is covered by the remaining
    `if` now that `DestMarker` is set to the trailing marker if `Dest` is `end()`.
    Previously the "===" markers were sometimes detached, now we always detach
    them and always reattach them.

    Note: `deleteTrailingDbgRecords` only "unlinks" the tailing marker from the
    block, it doesn't delete anything. The trailing marker is still cleaned up
    properly inside the final `if` body with `DestMarker->eraseFromParent();`.

    Part 1 of 2 needed for #105571

    (cherry picked from commit f5815534d180c544bffd46f09c28b6fc334260fb)

commit 894ec4e3a1d56a5dd5a8205b4fd734136db87cfd
Author: Luke Shingles <luke.shingles@gmail.com>
Date:   Thu Aug 29 11:09:07 2024 +0100

    [analyzer] Add missing include <unordered_map> to llvm/lib/Support/Z3Solver.cpp (#106410)

    Resolves #106361. Adding #include <unordered_map> to
    llvm/lib/Support/Z3Solver.cpp fixes compilation errors for homebrew
    build on macOS with Xcode 14.
    https://github.com/Homebrew/homebrew-core/actions/runs/10604291631/job/29390993615?pr=181351
    shows that this is resolved when the include is patched in (Linux CI
    failure is due to unrelated timeout).

    (cherry picked from commit fcb3a0485857c749d04ea234a8c3d629c62ab211)

commit 03cc174e0307ec90091c31c621bd6cee4338c4da
Author: Corentin Jabot <corentinjabot@gmail.com>
Date:   Thu Aug 29 11:11:39 2024 +0200

    Revert "[clang] fix broken canonicalization of DeducedTemplateSpecializationType (#95202)"

    This reverts commit 2e1ad93961a3f444659c5d02d800e3144acccdb4.

    Reverting #95202 in the 19.x branch

    Fixes #106182

    The change in #95202 causes code to crash and there is
    no good way to backport a fix for that as there are ABI-impacting
    changes at play.
    Instead we revert #95202 in the 19x branch, fixing the regression
    and preserving the 18.x behavior (which is GCC's behavior)

    https://github.com/llvm/llvm-project/pull/106335#discussion_r1735174841

commit c8c66e01d83323a3db57fade24befb26b0e6fe84
Author: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
Date:   Thu Aug 29 15:42:57 2024 +0800

    [C++20] [Modules] Don't insert class not in named modules to PendingEmittingVTables (#106501)

    Close https://github.com/llvm/llvm-project/issues/102933

    The root cause of the issue is an oversight in
    https://github.com/llvm/llvm-project/pull/102287 that I didn't notice
    that PendingEmittingVTables should only accept classes in named modules.

    (cherry picked from commit 47615ff2347a8be429404285de3b1c03b411e7af)

commit 72a74e44ef6f27c10a2da55fa67bde22d52516c6
Author: Tom Stellard <tstellar@redhat.com>
Date:   Wed Aug 28 22:18:08 2024 -0700

    workflows/release-tasks: Pass required secrets to all called workflows (#106286)

    Called workflows don't have access to secrets by default, so we need to
    explicitly pass secrets that we use.

    (cherry picked from commit 9d81e7e36e33aecdee05fef551c0652abafaa052)

commit bac3db3c8b41c921d8ec895a9dc89ce310a670cb
Author: Owen Pan <owenpiano@gmail.com>
Date:   Wed Aug 28 18:23:54 2024 -0700

    [clang-format] Revert "[clang-format][NFC] Delete TT_LambdaArrow (#70… (#105923)

    …519)"

    This reverts commit e00d32afb9d33a1eca48e2b041c9688436706c5b and adds a
    test for lambda arrow SplitPenalty.

    Fixes #105480.

commit 491375504831aae3da85ffb288ca3c8b3c94b1ea
Author: Joseph Huber <huberjn@outlook.com>
Date:   Tue Aug 6 21:33:25 2024 -0500

    Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226)

    This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb.

    Fixes: https://github.com/llvm/llvm-project/issues/100212
    (cherry picked from commit 030ee841a9c9fbbd6e7c001e751737381da01f7b)

    Conflicts:
    	clang/test/Driver/linker-wrapper-passes.c

commit f88180bbc489a587954adfce40cc5c90adc74962
Author: Krasimir Georgiev <krasimir@google.com>
Date:   Wed Aug 28 13:45:17 2024 +0200

    [clang-format] js handle anonymous classes (#106242)

    Addresses a regression in JavaScript when formatting anonymous classes.

    ---------

    Co-authored-by: Owen Pan <owenpiano@gmail.com>
    (cherry picked from commit 77d63cfd18aa6643544cf7acd5ee287689d54cca)

commit 9ec54c307b6151b1ddb3f7fe3b7cba4d9309b26c
Author: Owen Pan <owenpiano@gmail.com>
Date:   Tue Aug 27 19:13:27 2024 -0700

    [clang-format] Fix misalignments of pointers in angle brackets (#106013)

    Fixes #105898.

    (cherry picked from commit 656d5aa95825515a55ded61f19d41053c850c82d)

commit 32927ca57e805681fa93ed913c0f0d3c075563b7
Author: Alexander Richardson <alexrichardson@google.com>
Date:   Tue Aug 27 15:37:24 2024 -0700

    [compiler-rt] Fix definition of `usize` on 32-bit Windows

    32-bit Windows uses `unsigned int` for uintptr_t and size_t.
    Commit 18e06e3e2f3d47433e1ed323b8725c76035fc1ac changed uptr to
    unsigned long, so it no longer matches the real size_t/uintptr_t and
    therefore the current definition of usize result in:
    `error C2821: first formal parameter to 'operator new' must be 'size_t'`

    However, the real problem is that uptr is wrong to work around the fact
    that we have local SIZE_T and SSIZE_T typedefs that trample on the
    basetsd.h definitions of the same name and therefore need to match
    exactly. Unlike size_t/ssize_t the uppercase ones always use unsigned
    long (even on 32-bit).

    This commit works around the build breakage by keeping the existing
    definitions of uptr/sptr and just changing usize. A follow-up change
    will attempt to fix this properly.

    Fixes: https://github.com/llvm/llvm-project/issues/101998

    Reviewed By: mstorsjo

    Pull Request: https://github.com/llvm/llvm-project/pull/106151

    (cherry picked from commit bb27dd853a713866c025a94ead8f03a1e25d1b6e)

commit 6883c490e04a0f681b95e32eaa74aa82458bdb28
Author: Louis Dionne <ldionne.2@gmail.com>
Date:   Tue Aug 27 14:22:25 2024 -0400

    [libc++] Add missing include to three_way_comp_ref_type.h

    We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`.

    rdar://134425695
    (cherry picked from commit 0df78123fdaed39d5135c2e4f4628f515e6d549d)

commit 52ab956704050302926e8afe1c7dbda4578acb9d
Author: Younan Zhang <zyn7109@gmail.com>
Date:   Tue Aug 27 09:25:53 2024 +0800

    [Clang][Sema] Revisit the fix for the lambda within a type alias template decl (#89934)

    In the last patch #82310, we used template depths to tell if such alias
    decls contain lambdas, which is wrong because the lambda can also appear
    as a part of the default argument, and that would make
    `getTemplateInstantiationArgs` provide extra template arguments in
    undesired contexts. This leads to issue #89853.

    Moreover, our approach
    for https://github.com/llvm/llvm-project/issues/82104 was sadly wrong.
    We tried to teach `DeduceReturnType` to consider alias template
    arguments; however, giving these arguments in the context where they
    should have been substituted in a `TransformCallExpr` call is never
    correct.

    This patch addresses such problems by using a `RecursiveASTVisitor` to
    check if the lambda is contained by an alias `Decl`, as well as
    twiddling the lambda dependencies - we should also build a dependent
    lambda expression if the surrounding alias template arguments were
    dependent.

    Fixes #89853
    Fixes #102760
    Fixes #105885

    (cherry picked from commit b412ec5d3924c7570c2c96106f95a92403a4e09b)

commit 456006bc91c3d972f3c549b6296cad5e83630c7d
Author: SpencerAbson <Spencer.Abson@arm.com>
Date:   Fri Aug 23 14:27:49 2024 +0100

    [clang][AArch64] Add SME2.1 feature macros (#105657)

    (cherry picked from commit 2617023923175b0fd2a8cb94ad677c061c01627f)

commit ed699666de2d82eab266bf41372175da73202834
Author: Zaara Syeda <syzaara@ca.ibm.com>
Date:   Thu Aug 22 09:55:46 2024 -0400

    [PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453)

    These builtins are currently returning CR0 which will have the format
    [0, 0, flag_true_if_saved, XER].
    We only want to return flag_true_if_saved. This patch adds a shift to
    remove the XER bit before returning.

    (cherry picked from commit 327edbe07ab4370ceb20ea7c805f64950871d835)

commit d9806ffe4e4d26de9c01f6b8ac0deae169b1d88d
Author: Owen Pan <owenpiano@gmail.com>
Date:   Sat Aug 24 20:10:03 2024 -0700

    [clang-format] Fix a misannotation of less/greater as angle brackets (#105941)

    Fixes #105877.

    (cherry picked from commit 0916ae49b89db6eb9eee9f6fee4f1a65fd9cdb74)

commit 1b1ddb767e430a29c35c8b12760b7aa21f508d15
Author: Owen Pan <owenpiano@gmail.com>
Date:   Sat Aug 24 19:12:15 2024 -0700

    [clang-format] Fix a misannotation of redundant r_paren as CastRParen (#105921)

    Fixes #105880.

    (cherry picked from commit 6bc225e0630f28e83290a43c3d9b25b057fc815a)

commit 00ff55d61c765467e9a72c0fd570343d3cfb3b43
Author: Ian Anderson <iana@apple.com>
Date:   Thu Aug 22 13:44:58 2024 -0700

    [libunwind] Stop installing the mach-o module map (#105616)

    libunwind shouldn't know that compact_unwind_encoding.h is part of a
    MachO module that it doesn't own. Delete the mach-o module map, and let
    whatever is in charge of the mach-o directory be the one to say how its
    module is organized and where compact_unwind_encoding.h fits in.

    (cherry picked from commit 172c4a4a147833f1c08df1555f3170aa9ccb6cbe)

commit 09cca6b1897d501020c02769f9a937401f13e37a
Author: Jay Foad <jay.foad@amd.com>
Date:   Fri Aug 23 10:31:33 2024 +0100

    [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550)

    When a loop contains a VMEM load whose result is only used outside the
    loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
    vmcnt will be required inside the loop anyway, because VMEM instructions
    can write their VGPR results out of order.

    (cherry picked from commit fa2dccb377d0b712223efe5b62e5fc633580a9e6)

commit 441fb41cb487d286977b7e1cdabc3efe4c2010cf
Author: Jay Foad <jay.foad@amd.com>
Date:   Thu Aug 22 11:46:51 2024 +0100

    [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (#105549)

    Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
    WAW dependencies.

    (cherry picked from commit 5506831f7bc8dc04ebe77f4d26940007bfb4ab39)

commit daea6b9c40a1ee9d44f5658c182094147bb78340
Author: Jay Foad <jay.foad@amd.com>
Date:   Thu Aug 22 11:42:57 2024 +0100

    [AMDGPU] Add GFX12 test coverage for vmcnt flushing in loop headers (#105548)

    (cherry picked from commit 61194617ad7862f144e0f6db34175553e8c34763)

commit 3f768dd6806aeca74bfdf21bde9135d96b137ef3
Author: Maciej Gabka <maciej.gabka@arm.com>
Date:   Thu Aug 22 12:40:01 2024 +0000

     Add release note about ABI implementation changes for _BitInt on Arm

commit 45b149d2531948d2cc0e9d699a8e5371360a3bdf
Author: Tim Gymnich <tgymnich@icloud.com>
Date:   Thu Aug 8 02:51:04 2024 +0200

    [PowerPC] Respect endianness when bitcasting to fp128 (#95931)

    Fixes #92246

    Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when
    encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`.
    by inserting a missing swap of the arguments based on endianness.

    ### Current behaviour:
    **fp128**
    bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi
    BUILD_FP128 %lo %hi => MTVSRDD %hi %lo

    **v2i64**
    bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo
    BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi

    (cherry picked from commit 408d82d352eb98e2d0a804c66d359cd7a49228fe)

commit 40b076410194df3783b0c9cefa9f018fb190bdff
Author: alx32 <103613512+alx32@users.noreply.github.com>
Date:   Wed Aug 14 19:30:41 2024 -0700

    [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081)

    A crash was happening when both ObjC Category Merging and Relative
    method lists were enabled.

    ObjC Category Merging creates new data sections and adds them by calling
    `addInputSection`. `addInputSection` uses the symbols within the added
    section to determine which container to actually add the section to.

    The issue is that ObjC Category merging is calling `addInputSection`
    before actually adding the relevant symbols the the added section. This
    causes `addInputSection` to add the `InputSection` to the wrong
    container, eventually resulting in a crash.

    To fix this, we ensure that ObjC Category Merging calls
    `addInputSection` only after the symbols have been added to the
    `InputSection`.

    (cherry picked from commit 0df91893efc752a76c7bbe6b063d66c8a2fa0d55)

commit 78f97e22e5d87f9efc6b2c0ec76f60667458ca8a
Author: Balazs Benics <benicsbalazs@gmail.com>
Date:   Wed Aug 21 14:24:56 2024 +0200

    [analyzer] Limit `isTainted()` by skipping complicated symbols (#105493)

    As discussed in

    https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10

    Some `isTainted()` queries can blow up the analysis times, and
    effectively halt the analysis under specific workloads.

    We don't really have the time now to do a caching re-implementation of
    `isTainted()`, so we need to workaround the case.

    The workaround with the smallest blast radius was to limit what symbols
    `isTainted()` does the query (by walking the SymExpr). So far, the
    threshold 10 worked for us, but this value can be overridden using the
    "max-tainted-symbol-complexity" config value.

    This new option is "deprecated" from the getgo, as I expect this issue
    to be fixed within the next few months and I don't want users to
    override this value anyways. If they do, this message will let them know
    that they are on their own, and the next release may break them (as we
    no longer recognize this option if we drop it).

    Mitigates #89720

    CPP-5414

    (cherry picked from commit 848658955a9d2d42ea3e319d191e2dcd5d76c837)

commit 54579830d81a67cfa52b90c34bcf9a631f53fcc5
Author: Kai Yan <aklkaiyan@tencent.com>
Date:   Mon Aug 5 17:44:05 2024 +0800

    [llvm][CodeGen] Address the issue of multiple resource reservations In window scheduling (#101665)

    Address the issue of multiple resource reservations in window
    scheduling.

commit 5a164a28e37fe3cda99236595167f7762b47c76d
Author: Kai Yan <aklkaiyan@tencent.com>
Date:   Wed Jul 24 12:06:10 2024 +0800

    [llvm][CodeGen] Fixed max cycle calculation with zero-cost instructions for window scheduler (#99454)

    We discovered some scheduling failures occurring when zero-cost
    instructions were involved. This issue will be addressed by this patch.

commit 06d009789f771e8cef82714549c3136e320312be
Author: Kai Yan <aklkaiyan@tencent.com>
Date:   Thu Jul 25 19:16:23 2024 +0800

    [llvm][CodeGen] Fixed a bug in stall cycle calculation for window scheduler (#99451)

    Fixed a bug in stall cycle calculation.
    When a register defined by an instruction in the current iteration is
    used by an instruction in the next iteration, we have modified the
    number of stall cycle that need to be inserted.

commit 7b86034dcb8c7fd7ea125cec43f0117cd4a428b6
Author: Kai Yan <aklkaiyan@tencent.com>
Date:   Wed Jul 24 12:11:58 2024 +0800

    [llvm][CodeGen] Added a new restriction for II by pragma in window scheduler (#99448)

    Added a new restriction for window scheduling.
    Window scheduling is disabled when llvm.loop.pipeline.initiationinterval
    is set.

commit e81188d58202ee7b887e48bc3e4b102fc5f45619
Author: Kai Yan <aklkaiyan@tencent.com>
Date:   Wed Jul 24 12:06:35 2024 +0800

    [llvm][CodeGen] Added missing initialization failure information for window scheduler (#99449)

    Added missing initialization failure information for window scheduler.

commit 816fde1cbb700ebcc8b3df81fb93d675c04c12cd
Author: Michał Górny <mgorny@gentoo.org>
Date:   Thu Aug 29 20:57:25 2024 +0200

    [clang] Install scan-build-py into plain "lib" directory (#106612)

    Install scan-build-py modules into the plain `lib` directory,
    without LLVM_LIBDIR_SUFFIX appended, to match the path expected
    by `intercept-build` executable.  This fixes the program being unable
    to find its modules.  Using unsuffixed path makes sense here, since
    Python modules are not subject to multilib.

    This change effectively reverts 1334e129a39cb427e7b855e9a711a3e7604e50e5.
    The commit in question changed the path without a clear justification
    ("does not respect the given prefix") and the Python code was never
    modified to actually work with the change.

    Fixes #106608

    (cherry picked from commit 0c4cf79defe30d43279bf4526cdf32b6c7f8a197)

commit c21b039178b2efd17bc4eef906ab7b3a07cab288
Author: Tom Stellard <tstellar@redhat.com>
Date:   Fri Aug 30 19:46:33 2024 -0700

    workflows/release-binaries: Remove .git/config file from artifacts (#106310)

    The .git/config file contains an auth token that can be leaked if the
    .git directory is included in a workflow artifact.

    (cherry picked from commit ef50970204384643acca42ba4c7ca8f14865a0c2)

commit eba1ef5a1b7a84ed1954797dcd6d6f073b1f1a56
Author: Ahmed Bougacha <ahmed@bougacha.org>
Date:   Thu Aug 29 09:50:44 2024 -0700

    [AArch64] Make apple-m4 armv8.7-a again (from armv9.2-a).  (#106312)

    This is a partial revert of c66e1d6f3429.  Even though that
    allowed us to declare v9.2-a support without picking up SVE2
    in both the backend and the driver, the frontend itself still
    enabled SVE via the arch version's default extensions.

    Avoid that by reverting back to v8.7-a while we look into
    longer-term solutions.

    (cherry picked from commit e5e38ddf1b8043324175868831da21e941c00aff)

commit 1b643dbad74986718460f28347cbd17085402383
Author: Hans <hans@hanshq.net>
Date:   Thu Aug 29 13:54:30 2024 +0200

    Restrict LLVM_TARGETS_TO_BUILD in Windows release packaging (#106059)

    When including all targets, some files become too large for the NSIS
    installer to handle.

    Fixes #101994

    (cherry picked from commit 2a28df66dc3f7ff5b6233241837854acefb68d77)

commit 53c43bab2077644ecf152bebffd921572e418692
Author: Nathan Ridge <zeratul976@hotmail.com>
Date:   Sun Aug 25 02:10:45 2024 -0400

    [clangd] Add clangd 19 release notes

commit 5f744ee5c770d7332740bb6247f961e7d99ee359
Author: Dan Gohman <dev@sunfishcode.online>
Date:   Thu Aug 22 08:13:20 2024 -0700

    [DwarfEhPrepare] Assign dummy debug location for more inserted _Unwind_Resume calls (#105513)

    Similar to the fix for #57469, ensure that the other `_Unwind_Resume`
    call emitted by DwarfEHPrepare has a debug location if needed.

    This fixes https://github.com/nbdd0121/unwinding/issues/34.

    (cherry picked from commit e76db25832d6ac2d3a36769b26f982d9dee4b346)

commit cfe8eb89cbb8b8d873579123555a5238d9ad502c
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Thu Aug 1 16:08:33 2024 +0100

    [MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data (REAPPLIED)

    This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should

    Reapplied with codegen fix for scatter-schedule.ll

    Fixes #105675

    (cherry picked from commit cf6cd1fd67356ca0c2972992928592d2430043d2)

commit 3ff9d92aae0945daa85ec6f85f05a3aeaaa9f962
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Fri Aug 23 16:06:00 2024 +0800

    [ConstraintElim] Fix miscompilation caused by PR97974 (#105790)

    Fixes https://github.com/llvm/llvm-project/issues/105785.

    (cherry picked from commit 85b6aac7c25f9d2a976a76045ace1e7afebb5965)

commit 1241c762c165972690c4edfb82ec7421c1e64658
Author: Owen Pan <owenpiano@gmail.com>
Date:   Thu Aug 22 20:02:48 2024 -0700

    [clang-format] Don't insert a space between :: and * (#105043)

    Also, don't insert a space after ::* for method pointers.

    See
    https://github.com/llvm/llvm-project/pull/86253#issuecomment-2298404887.

    Fixes #100841.

    (cherry picked from commit 714033a6bf3a81b1350f969ddd83bcd9fbb703e8)

commit 1503d18171e569996bf3e107364b1f0fd5f750e9
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Tue Aug 20 11:11:33 2024 +0100

    [X86] Use correct fp immediate types in _mm_set_ss/sd

    Avoids implicit sint_to_fp which wasn't occurring on strict fp codegen

    Fixes #104848

    (cherry picked from commit 6dcce422ca06601f2b00e85cc18c745ede245ca6)

commit b6a562d90fa08543171bafbb9c897c03f6cf691f
Author: Björn Pettersson <bjorn.a.pettersson@ericsson.com>
Date:   Wed Aug 21 17:56:27 2024 +0200

    [DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924)

    In visitFREEZE we have been collecting a set/vector of
    MaybePoisonOperands that later was iterated over, applying a freeze to
    those operands. However, C-level fuzzy testing has discovered that the
    recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
    the MaybePoisonOperands vector to be replaced when replacing an earlier
    operand. That would then turn up as
       Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
                  "Operand is DELETED_NODE!"' failed.
    failures when trying to freeze those later operands.

    So we need to make sure that the vector with MaybePoisonOperands is
    mutated as well when needed. Or as the solution used in this patch, make
    sure to keep track of operand numbers that should be frozen instead of
    having a vector of SDValues. And then we can refetch the operands while
    iterating over operand numbers.

    The problem was seen after adding SELECT_CC to the set of operations
    including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
    guess that this could happen for other operations as well for which we
    allow multiple maybe poison operands.

    (cherry picked from commit 278fc8efdf004a1959a31bb4c208df5ee733d5c8)

commit 43b455b2d2e5107e19d7d47e77ba513d1f9f5e2f
Author: Carl Ritson <carl.ritson@amd.com>
Date:   Sat Aug 17 16:52:38 2024 +0900

    [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395)

    Prevent operand folding from inlining constants into pseudo scalar
    transcendental f16 instructions.
    However still allow literal constants.

    (cherry picked from commit fc6300a5f7ef430e4ec86d16be0b146de7fbd16b)

commit 38f3dbefab0a4965abad99aa23eced96d5d8dc16
Author: Bryce Kahle <bryce.kahle@datadoghq.com>
Date:   Tue Aug 20 12:25:33 2024 -0700

    use default intrinsic attrs for BPF packet loads

    The BPF packet load intrinsics lost attribute WillReturn due to 0b20c30. The attribute loss causes excessive bitshifting, resulting in previously working programs failing the BPF verifier due to instruction/complexity limits.

    cherry picked only the BPF changes from 99a10f1

    Signed-off-by: Bryce Kahle <bryce.kahle@datadoghq.com>

commit 6420a2ea06b6fc21547907eb447035be3e2b6b16
Author: Amy Kwan <amy.kwan1@ibm.com>
Date:   Tue Aug 20 10:30:09 2024 -0500

    Add AIX/PPC Clang/LLVM release notes for LLVM 19.

commit 8ea372d8b628b0a11016f5282d47c372e3843b93
Author: Koakuma <koachan@protonmail.com>
Date:   Tue Aug 20 20:05:06 2024 +0700

    [SPARC] Remove assertions in printOperand for inline asm operands (#104692)

    Inline asm operands could contain any kind of relocation, so remove the
    checks.

    Fixes https://github.com/llvm/llvm-project/issues/103493

    (cherry picked from commit 576b7a781aac6b1d60a72248894b50e565e9185a)

commit 9dc4bdf9fd1e4be051fe19998d64230d999b777d
Author: Ian Anderson <iana@apple.com>
Date:   Tue Aug 20 03:29:11 2024 -0700

    [clang][modules] Built-in modules are not correctly enabled for Mac Catalyst (#104872)

    Mac Catalyst is the iOS platform, but it builds against the macOS SDK
    and so it needs to be checking the macOS SDK version instead of the iOS
    one. Add tests against a greater-than SDK version just to make sure this
    works beyond the initially supporting SDKs.

    (cherry picked from commit b9864387d9d00e1d4888181460d05dbc92364d75)

commit 9301cd5b57c09214256edf19753e2e047a5b5f91
Author: Rainer Orth <ro@gcc.gnu.org>
Date:   Tue Jul 30 10:06:45 2024 +0200

    [sanitizer_common] Make sanitizer_linux.cpp kernel_stat* handling Linux-specific

    fcd6bd5587cc376cd8f43b60d1c7d61fdfe0f535 broke the Solaris/sparcv9 buildbot:
    ```
    compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:39:14: fatal error: 'asm/unistd.h' file not found
       39 | #    include <asm/unistd.h>
          |              ^~~~~~~~~~~~~~
    ```
    That section should have been Linux-specific in the first place, which is
    what this patch does.

    Tested on sparcv9-sun-solaris2.11.

    (cherry picked from commit 16e9bb9cd7f50ae2ec7f29a80bc3b95f528bfdbf)

commit 437434df21d839becb453f6821564662e9824f02
Author: Tobias Hieta <tobias@hieta.se>
Date:   Tue Aug 20 10:06:55 2024 +0200

    Bump version to 19.1.0-rc3

commit 72d2932da5a7c70885a1fdfaa809ff1ede0984ff
Author: John Brawn <john.brawn@arm.com>
Date:   Thu Aug 8 11:20:09 2024 +0100

    [libunwind] Fix problems caused by combining BTI and GCS (#102322)

    The libunwind assembly files need adjustment in order to work correctly
    when both BTI and GCS are both enabled (which will be the case when
    using -mbranch-protection=standard):
    * __libunwind_Registers_arm64_jumpto can't use br to jump to the return
    location, instead we need to use gcspush then ret.
    * Because we indirectly call __libunwind_Registers_arm64_jumpto it needs
    to start with bti jc.
     * We need to set the GCS GNU property bit when it's enabled.

    ---------

    Co-authored-by: Daniel Kiss <daniel.kristof.kiss@gmail.com>
    (cherry picked from commit 39529107b46032ef0875ac5b809ab5b60cd15a40)

commit c3da16b094511e42022e534b5eb665dbc3f8db0f
Author: John Brawn <john.brawn@arm.com>
Date:   Mon Aug 5 18:54:05 2024 +0100

    [libunwind] Be more careful about enabling GCS (#101973)

    We need both GCS to be enabled by the compiler (which we do by checking
    if __ARM_FEATURE_GCS_DEFAULT is defined) and for arm_acle.h to define
    the GCS intrinsics. Check the latter by checking if _CHKFEAT_GCS is
    defined.

    (cherry picked from commit c649194a71b47431f2eb2e041435d564e3b51072)

commit 7e7e8125cfabf7daf5de63612e6f2c646dd8cad3
Author: John Brawn <john.brawn@arm.com>
Date:   Sun Aug 4 13:27:12 2024 +0100

    [libunwind] Add GCS support for AArch64 (#99335)

    AArch64 GCS (Guarded Control Stack) is similar enough to CET that we can
    re-use the existing code that is guarded by _LIBUNWIND_USE_CET, so long
    as we also add defines to locate the GCS stack and pop the entries from
    it. We also need the jumpto function to exit using br instead of ret, to
    prevent it from popping the GCS stack.

    GCS support is enabled using the LIBUNWIND_ENABLE_GCS cmake option. This
    enables -mbranch-protection=standard, which enables GCS. For the places
    we need to use GCS instructions we use the target attribute, as there's
    not a command-line option to enable a specific architecture extension.

    (cherry picked from commit b32aac4358c1f6639de7c453656cd74fbab75d71)

commit 64b8514e6c1a663660fbb93ec7f623b3e40a2020
Author: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
Date:   Thu Aug 8 13:14:09 2024 +0800

    Reland [C++20] [Modules] [Itanium ABI] Generate the vtable in the mod… (#102287)

    Reland https://github.com/llvm/llvm-project/pull/75912

    The differences of this PR between
    https://github.com/llvm/llvm-project/pull/75912 are:

    - Fixed a regression in `Decl::isInAnotherModuleUnit()` in DeclBase.cpp
    pointed by @mizvekov and add the corresponding test.
    - Fixed the regression in windows
    https://github.com/llvm/llvm-project/issues/97447. The changes are in
    `CodeGenModule::getVTableLinkage` from
    `clang/lib/CodeGen/CGVTables.cpp`. According to the feedbacks from MSVC
    devs, the linkage of vtables won't affected by modules. So I simply
    skipped the case for MSVC.

    Given this is more or less fundamental to the use of modules. I hope we
    can backport this to 19.x.

    (cherry picked from commit 847f9cb0e868c8ec34f9aa86fdf846f8c4e0388b)

commit 3ffa5421ca657c04d4df170307c1f9a3c6293003
Author: Aaron Ballman <aaron@aaronballman.com>
Date:   Mon Aug 19 16:54:12 2024 -0400

    [C++23] Fix infinite recursion (Clang 19.x regression) (#104829)

    d469794d0cdfd2fea50a6ce0c0e33abb242d744c was fixing an issue with
    triggering vtable instantiations, but it accidentally introduced
    infinite recursion when the type to be checked is the same as the type
    used in a base specifier or field declaration.

    Fixes #104802

    (cherry picked from commit 435cb0dc5eca08cdd8d9ed0d887fa1693cc2bf33)

commit 6dbc0e236b3e3a651302d079d1c64934976bc0b3
Author: Martin Storsjö <martin@martin.st>
Date:   Sun Aug 18 00:44:16 2024 +0300

    [LLD] [MinGW] Recognize the -rpath option (#102886)

    GNU ld silently accepts the -rpath option for Windows targets, as a
    no-op.

    This has lead to some build systems (and users) passing this option
    while building for Windows/MinGW, even if Windows doesn't have any
    concept like rpath.

    Older versions of Conan did include -rpath in the pkg-config files it
    generated, see e.g.

    https://github.com/conan-io/conan/blob/17c58f0c61931f9de218ac571cd97a8e0befa68e/conans/client/generators/pkg_config.py#L104-L114
    and
    https://github.com/conan-io/conan/blob/17c58f0c61931f9de218ac571cd97a8e0befa68e/conans/client/build/compiler_flags.py#L26-L34
    - and see https://github.com/mstorsjo/llvm-mingw/issues/300 for user
    reports about this issue.

    Recognize the option in LLD for MinGW targets, to improve drop-in
    compatibility compared to GNU ld, but produce a warning to alert users
    that the option really has no effect for these targets.

    (cherry picked from commit 69f76c782b554a004078af6909c19a11e3846415)

commit c1336c9e3bd6c0887ead386043c547b3a3ed76a9
Author: David Green <david.green@arm.com>
Date:   Mon Aug 19 18:50:47 2024 +0100

    [GlobalISel] Bail out early for big-endian (#103310)

    If we continue through the function we can currently hit crashes. We can
    bail out early and fall back to SDAG.

    Fixes #103032

    (cherry picked from commit 05d17a1c705e1053f95b90aa37d91ce4f94a9287)

commit 263965ebe237e2f82d714a12a8c9338b46237a33
Author: Tomas Matheson <Tomas.Matheson@arm.com>
Date:   Sat Aug 17 13:36:40 2024 +0100

    [AArch64] Add a check for invalid default features (#104435)

    This adds a check that all ExtensionWithMArch which are marked as
    implied features for an architecture are also present in the list of
    default features. It doesn't make sense to have something mandatory but
    not on by default.

    There were a number of existing cases that violated this rule, and some
    changes to which features are mandatory (indicated by the Implies
    field).

    This resulted in a bug where if a feature was marked as `Implies` but
    was not added to `DefaultExt`, then for `-march=base_arch+nofeat` the
    Driver would consider `feat` to have never been added and therefore
    would do nothing to disable it (no `-target-feature -feat` would be
    added, but the backend would enable the feature by default because of
    `Implies`). See
    clang/test/Driver/aarch64-negative-modifiers-for-default-features.c.

    Note that the processor definitions do not respect the architecture
    DefaultExts. These apply only when specifying `-march=<some architecture
    version>`. So when a feature is moved from `Implies` to `DefaultExts` on
    the Architecture definition, the feature needs to be added to all
    processor definitions (that are based on that architecture) in order to
    preserve the existing behaviour. I have checked the TRMs for many cases
    (see specific commit messages) but in other cases I have just kept the
    current behaviour and not tried to fix it.

commit bb46c721211b901f7ab34551e4bb240308203da9
Author: Vladislav Khmelevsky <och95@yandex.ru>
Date:   Sat Jul 27 23:07:59 2024 +0400

    release/19.x: [BOLT] Fix relocations handling

    Backport https://github.com/llvm/llvm-project/commit/097ddd3565f830e6cb9d0bb8ca66844b7f3f3cbb

commit 8595e91b16dadc33fbb321cfd30b77f43f64e10e
Author: Anton Korobeynikov <anton@korobeynikov.info>
Date:   Fri Aug 16 18:09:53 2024 -0700

    Add some brief LLVM 19 release notes for Pointer Authentication ABI support.

commit 9545ef53ebe8be2a53ef6f84626f52bed73c82ba
Author: Craig Topper <craig.topper@sifive.com>
Date:   Fri Aug 16 14:54:51 2024 -0700

    [Mips] Fix fast isel for i16 bswap. (#103398)

    We need to mask the SRL result to 8 bits before ORing in the SLL. This
    is needed in case bits 23:16 of the input aren't zero. They will have
    been shifted into bits 15:8.

    We don't need to AND the result with 0xffff. It's ok if the upper 16
    bits of the register are garbage.

    Fixes #103035.

    (cherry picked from commit ebe7265b142f370f0a563fece5db22f57383ba2d)

commit 6fcbfb8ebc9650a2ea184aac244d067efdbe441e
Author: Sharadh Rajaraman <r.sharadh@outlook.sg>
Date:   Mon Aug 19 12:17:58 2024 +0100

    [clang][driver] `TY_ModuleFile` should be a 'CXX' file type

commit 38a591de66a86aaf523f78f8266a2d5f01a1b106
Author: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
Date:   Tue Aug 13 15:34:41 2024 -0300

    [OpenMP][AArch64] Fix branch protection in microtasks (#102317)

    Start __kmp_invoke_microtask with PACBTI in order to identify the
    function as a valid branch target. Before returning, SP is
    authenticated.
    Also add the BTI and PAC markers to z_Linux_asm.S.

    With this patch, libomp.so can now be generated with DT_AARCH64_BTI_PLT
    when built with -mbranch-protection=standard.

    The implementation is based on the code available in compiler-rt.

    (cherry picked from commit 0aa22dcd2f6ec5f46b8ef18fee88066463734935)

commit 6e3026883d77124e32a2a7be72c3361fba3e7457
Author: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Date:   Mon Aug 12 09:08:46 2024 +0200

    [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605)

    In C++23 anything can be constexpr, including a dtor of a class whose
    members and bases don't have constexpr dtors. Avoid early triggering of
    vtable instantiation int this case.

    Fixes https://github.com/llvm/llvm-project/issues/102293

    (cherry picked from commit d469794d0cdfd2fea50a6ce0c0e33abb242d744c)

commit 8fbe69a407b2784c7e9d91a3c69daa9786b14391
Author: Hari Limaye <hari.limaye@arm.com>
Date:   Tue Aug 6 11:39:01 2024 +0100

    [AArch64] Add streaming-mode stack hazard optimization remarks (#101695)

    Emit an optimization remark when objects in the stack frame may cause
    hazards in a streaming mode function. The analysis requires either the
    `aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag
    to be set by the user, with the former flag taking precedence.

    (cherry picked from commit a98a0dcf63f54c54c5601a34c9f8c10cde0162d6)

commit b45f75295e3038ef79dce4ac63fbf95b659eebe5
Author: Piotr Zegar <me@piotrzegar.pl>
Date:   Thu Jul 25 17:26:01 2024 +0200

    [clang-tidy] Fix crash in C language in readability-non-const-parameter (#100461)

    Fix crash that happen when redeclaration got
    different number of parameters than definition.

    Fixes #100340

    (cherry picked from commit a27f816fe56af9cc7f4f296ad6c577f6ea64349f)

commit 90f2d48965ca8a27f4b814ada987d169ca6a6f44
Author: Louis Dionne <ldionne.2@gmail.com>
Date:   Fri Aug 16 11:08:34 2024 -0400

    [libc++] Fix rejects-valid in std::span copy construction (#104500)

    Trying to copy-construct a std::span from another std::span holding an
    incomplete type would fail as we evaluate the SFINAE for the range-based
    constructor. The problem was that we checked for __is_std_span after
    checking for the range being a contiguous_range, which hard-errored
    because of arithmetic on a pointer to incomplete type.

    As a drive-by, refactor the whole test and format it.

    Fixes #104496

    (cherry picked from commit 99696b35bc8a0054e0b0c1a26e8dd5049fa8c41b)

commit 02cafa895c917a4b1726e64a5870877c95826be4
Author: Spencer Abson <Spencer.Abson@arm.com>
Date:   Fri Aug 16 14:39:43 2024 +0000

    [AArch64] Adopt updated B16B16 target flags

    The enablement of SVE/SME non-widening BFloat16 instructions was recently
    changed in response to an architecture update, in which:
    	- FEAT_SVE_B16B16 was weakened
    	- FEAT_SME_B16B16 was introduced
    New flags, 'sve-b16b16' and 'sme-b16b16' were introduced to replace the
    existing 'b16b16'. This was acheived in the below two patches.
    	- https://github.com/llvm/llvm-project/pull/101480
    	- https://github.com/llvm/llvm-project/pull/102501
    Ideally, the interface change introduced here will be valid in LLVM-19.
    We do not see it necessary to back-port the entire change, but just to add
    'sme-b16b16' and 'sve-b16b16' as aliases to the existing (and unchanged)
    'b16b16' and 'sme2' flags which together cover all of these features.

    The predication of Bf16 variants of svmin/svminnm and svmax/svmaxnm is also
    fixed in this change.

commit 9e90c40564e21dc5f1a12e08cfdf29305aaf9f50
Author: Gulfem Savrun Yeniceri <gulfem@google.com>
Date:   Tue Jul 23 11:06:30 2024 +0000

    Revert "[CGData] llvm-cgdata (#89884)"

    This reverts commit d3fb41dddc11b0ebc338a3b9e6a5ab7288ff7d1d
    and forward fix patches because of the issue explained in:
    https://github.com/llvm/llvm-project/pull/89884#issuecomment-2244348117.

    Revert "Fix tests for https://github.com/llvm/llvm-project/pull/89884
    (#100061)"

  …
kv-sc pushed a commit to syntacore/snippy that referenced this pull request Dec 5, 2024
* [Metadata] Try to merge the first and last ranges. (#101860)

Fixes #101859.

If we have at least 2 ranges, we have to try to merge the last and first
ones to handle the wrap range.

(cherry picked from commit 4377656f2419a8eb18c01e86929b689dcf22b5d6)

* InferAddressSpaces: Fix mishandling stores of pointers to themselves (#101877)

(cherry picked from commit 3c483b887e5a32a0ddc0a52a467b31f74aad25bb)

* [ARM] [Windows] Use IMAGE_SYM_CLASS_STATIC for private functions (#101828)

For functions with private linkage, pick
IMAGE_SYM_CLASS_STATIC rather than IMAGE_SYM_CLASS_EXTERNAL;
GlobalValue::isInternalLinkage() only checks for
InternalLinkage, while GlobalValue::isLocalLinkage() checks for both
InternalLinkage and PrivateLinkage.

This matches what the AArch64 target does, since commit
3406934e4db4bf95c230db072608ed062c13ad5b.

This activates a preexisting fix for the AArch64 target from
1e7f592a890aad860605cf5220530b3744e107ba, for the ARM target as well.

When a relocation points at a symbol, one usually can convey an offset
to the symbol by encoding it as an immediate in the instruction.
However, for the ARM and AArch64 branch instructions, the immediate
stored in the instruction is ignored by MS link.exe (and lld-link
matches this aspect). (It would be simple to extend lld-link to support
it - but such object files would be incompatible with MS link.exe.)

This was worked around by 1e7f592a890aad860605cf5220530b3744e107ba by
emitting symbols into the object file symbol table, for temporary
symbols that otherwise would have been omitted, if they have the class
IMAGE_SYM_CLASS_STATIC, in order to avoid needing an offset in the
relocated instruction.

This change gives the symbols generated from functions with the IR level
"private" linkage the right class, to activate that workaround.

This fixes https://github.com/llvm/llvm-project/issues/100101, fixing
code generation for coroutines for Windows on ARM. After the change in
f78688134026686288a8d310b493d9327753a022, coroutines generate a function
with private linkage, and calls to this function were previously broken
for this target.

(cherry picked from commit 8dd065d5bc81b0c8ab57f365bb169a5d92928f25)

* Forward declare OSSpinLockLock on MacOS since it's not shipped on the system. (#101392)

Fixes build errors on some SDKs.

rdar://132607572
(cherry picked from commit 3a4c7cc56c07b2db9010c2228fc7cb2a43dd9b2d)

* ReleaseNotes: lld/ELF: mention CREL

* Bump version to 19.1.0-rc2

* [sanitizer_common][test] Fix SanitizerIoctl/KVM_GET_* tests on Linux/… (#100532)

…sparc64

Two ioctl tests `FAIL` on Linux/sparc64 (both 32 and 64-bit):
```
  SanitizerCommon-Unit :: ./Sanitizer-sparc-Test/SanitizerIoctl/KVM_GET_LAPIC
  SanitizerCommon-Unit :: ./Sanitizer-sparc-Test/SanitizerIoctl/KVM_GET_MP_STATE
```
like
```
compiler-rt/lib/sanitizer_common/tests/./Sanitizer-sparc-Test --gtest_filter=SanitizerIoctl.KVM_GET_LAPIC
--
compiler-rt/lib/sanitizer_common/tests/sanitizer_ioctl_test.cpp:91: Failure
Value of: res
  Actual: false
Expected: true

compiler-rt/lib/sanitizer_common/tests/sanitizer_ioctl_test.cpp:92: Failure
Expected equality of these values:
  ioctl_desc::WRITE
    Which is: 2
  desc.type
    Which is: 1
```
The problem is that Linux/sparc64, like Linux/mips, uses a different
layout for the `ioctl` `request` arg than most other Linux targets as
can be seen in `sanitizer_platform_limits_posix.h` (`IOC_*`). Therefore,
this patch makes the tests use the correct one.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit 9eefe065bb2752b0db9ed553d2406e9a15ce349e)

* [sanitizer_common] Don't use syscall(SYS_clone) on Linux/sparc64 (#100534)

```
  SanitizerCommon-Unit :: ./Sanitizer-sparc-Test/SanitizerCommon/StartSubprocessTest
```
and every single test using the `llvm-symbolizer` `FAIL` on
Linux/sparc64 in a very weird way: when using `StartSubprocess`, there's
a call to `internal_fork`, but we never reach `internal_execve`.
`internal_fork` is implemented using `syscall(SYS_clone)`. The calling
convention of that syscall already varies considerably between targets,
but as documented in `clone(2)`, SPARC again is widely different.
Instead of trying to match `glibc` here, this patch just calls `__fork`.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit 1c53b907bd6348138a59da270836fc9b4c161a07)

* [sanitizer_common] Adjust signal_send.cpp for Linux/sparc64 (#100538)

```
  SanitizerCommon-ubsan-sparc-Linux :: Linux/signal_send.cpp
```
currently `FAIL`s on Linux/sparc64 (32 and 64-bit). Instead of the
expected values for `SIGUSR1` (`10`) and `SIGUSR1` (`12`), that target
uses `30` and `31`.

On Linux/x86_64, the signals get their values from
`x86_64-linux-gnu/bits/signum-generic.h`, to be overridden in
`x86_64-linux-gnu/bits/signum.h`. On Linux/sparc64 OTOH, the definitions
are from `sparc64-linux-gnu/bits/signum-arch.h` and remain that way.
There's no `signum.h` at all.

The patch allows for both values.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit 7cecbdfe4eac3fd7268532426fb6b13e51b8720d)

* [sanitizer_common] Fix internal_*stat on Linux/sparc64 (#101012)

```
  SanitizerCommon-Unit :: ./Sanitizer-sparcv9-Test/SanitizerCommon/FileOps
```
`FAIL`s on 64-bit Linux/sparc64:
```
projects/compiler-rt/lib/sanitizer_common/tests/./Sanitizer-sparcv9-Test --gtest_filter=SanitizerCommon.FileOps
--
compiler-rt/lib/sanitizer_common/tests/sanitizer_libc_test.cpp:144: Failure
Expected equality of these values:
  len1 + len2
    Which is: 10
  fsize
    Which is: 1721875535
```
The issue is similar to the mips64 case: the Linux/sparc64 `*stat`
syscalls take a `struct kernel_stat64 *` arg. Also the syscalls actually
used differ.

This patch handles this, adopting the mips64 code to avoid too much
duplication.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit fcd6bd5587cc376cd8f43b60d1c7d61fdfe0f535)

* [ADT] Add `<cstdint>` to SmallVector (#101761)

SmallVector uses `uint32_t`, `uint64_t` without including `<cstdint>`
which fails to build w/ GCC 15 after a change in libstdc++ [0]

[0] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3a817a4a5a6d94da9127af3be9f84a74e3076ee2

(cherry picked from commit 7e44305041d96b064c197216b931ae3917a34ac1)

* [libc++][bit] Improves rotate functions. (#98032)

Investigating #96612 shows our implementation was different from the
Standard and could cause UB. Testing the codegen showed quite a bit of
assembly generated for these functions. The functions have been written
differently which allows Clang to optimize the code to use simple CPU
rotate instructions.

Fixes: https://github.com/llvm/llvm-project/issues/96612

* [AArch64] Avoid inlining if ZT0 needs preserving. (#101343)

Inlining may result in different behaviour when the callee clobbers ZT0,
because normally the call-site will have code to preserve ZT0. When
inlining the function this code to preserve ZT0 will no longer be
emitted, and so the resulting behaviour of the program is changed.

(cherry picked from commit fb470db7b3a8ce6853e8bf17d235617a2fa79434)

* [AArch64] Avoid NEON dot product in streaming[-compatible] functions (#101677)

The NEON dot product is not valid in streaming mode.
A follow-up patch will improve codegen for these operations.

(cherry picked from commit 12937b1bfb23cca4731fa274f3358f7286cc6784)

* [AArch64][SME] Rewrite __arm_sc_memset to remove invalid instruction (#101522)

The implementation of __arm_sc_memset in compiler-rt contains
a Neon dup instruction which is not valid in streaming mode. This
patch rewrites the function, using an SVE mov instruction if available.

(cherry picked from commit d6649f2d4871c4535ae0519920e36100748890c4)

* [LLVM][TTI][SME] Allow optional auto-vectorisation for streaming functions. (#101679)

The command line option enable-scalable-autovec-in-streaming-mode is
used to enable scalable vectors but the same check is missing from
enableScalableVectorization, which is blocking auto-vectorisation.

(cherry picked from commit 7775a4882d7105fde7f7a81f3c72567d39afce45)

* [Driver] Restrict Ofast deprecation help message to Clang (#101682)

The discussion about this in Flang
(https://discourse.llvm.org/t/rfc-deprecate-ofast-in-flang/80243) has
not concluded hence restricting the deprecation only to Clang.

(cherry picked from commit e60ee1f2d70bdb0ac87b09ae685d669d8543b7bd)

* [Clang] SFINAE on mismatching pack length during constraint satisfaction checking (#101879)

If a fold expanded constraint would expand packs of different size, it
is not a valid pack expansion and it is not satisfied. This should not
produce an error.

Fixes #99430

(cherry picked from commit da380b26e4748ade5a8dba85b7df5e1c4eded8bc)

* [Driver] Temporarily probe aarch64-linux-gnu GCC installation

As the comment explains, `*Triples[]` lists are discouraged and not
comprehensive anyway (e.g.
aarch64-unknown-linux-gnu/aarch64-unknown-linux-musl/aarch64-amazon-linux
do not work).

Boost incorrectly specifies --target=arm64-pc-linux ("arm64" should not
be used for Linux) and expects to probe "aarch64-linux-gnu". Add this
temporary workaround for the 19.x releases.

* workflows/release-tasks: Add missing permissions for release binaries (#102023)

Now that the release binaries create artifact attestations, we need to
ensure that we call the workflow with the correct permissions.

(cherry picked from commit dc349a3f47882cdac7112c763d2964b59e77356a)

* workflows/release-binaries: Give attestation artifacts a unique name (#102041)

We need a different attestation for each supported architecture, so
there artifacts all need to have a different name.

The upload step is run on a Linux runner, so no matter which
architecture's binary is being uploaded the runner.os and runner.arch
variables would always be 'Linux' and 'X64' and so we can't use them for
naming the artifact.

(cherry picked from commit 3c8dadda3aa20b89fb5ad29ae31380d9594c3430)

* [BinaryFormat] Disable MachOTest.UnalignedLC on SPARC (#100086)

As discussed in Issue #86793, the `MachOTest.UnalignedLC` test dies with
`SIGBUS` on SPARC, a strict-alignment target. It simply cannot work
there. Besides, the test invokes undefined behaviour on big-endian
targets, so this patch disables it on all of those.

Tested on `sparcv9-sun-solaris2.11` and `amd64-pc-solaris2.11`.

(cherry picked from commit 3a226dbe27ac7c7d935bc0968e84e31798a01207)

* [LLDB] Add `<cstdint>` to AddressableBits (#102110)

(cherry picked from commit bb59f04e7e75dcbe39f1bf952304a157f0035314)

* [LAA] Refine stride checks for SCEVs during dependence analysis. (#99577)

Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.

Update getPtrStride to return 0 for invariant pointers.

Then proceed by checking the strides.

If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.

Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.

If both are strided by constants, we proceed as previously.

This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.

Fixes https://github.com/llvm/llvm-project/issues/87189.

PR: https://github.com/llvm/llvm-project/pull/99577

* [CalcSpillWeights] Avoid x87 excess precision influencing weight result

Fixes #99396

The result of `VirtRegAuxInfo::weightCalcHelper` can be influenced by
x87 excess precision, which can result in slightly different register
choices when the compiler is hosted on x86_64 or i386. This leads to
different object file output when cross-compiling to i386, or native.

Similar to 7af3432e22b0, we need to add a `volatile` qualifier to the
local `Weight` variable to force it onto the stack, and avoid the excess
precision. Define `stack_float_t` in `MathExtras.h` for this purpose,
and use it.

(cherry picked from commit c80c09f3e380a0a2b00b36bebf72f43271a564c1)

* [BOLT] Support map other function entry address (#101466)

Allow BOLT to map the old address to a new binary address if the old
address is the entry of the function.

(cherry picked from commit 734c0488b6e69300adaf568f880f40b113ae02ca)

* [lld][ARM] Fix assertion when mixing ARM and Thumb objects (#101985)

Previously, we selected the Thumb2 PLT sequences if any input object is
marked as not supporting the ARM ISA, which then causes assertion
failures when calls from ARM code in other objects are seen. I think the
intention here was to only use Thumb PLTs when the target does not have
the ARM ISA available, signalled by no objects being marked as having it
available. To do that we need to track which ISAs we have seen as we
parse the build attributes, and defer the decision about PLTs until all
input objects have been parsed.

This bug was triggered by real code in picolibc, which have some
versions of string.h functions built with Thumb2-only build attributes,
so that they are compatible with v7-A, v7-R and v7-M.

Fixes #99008.

(cherry picked from commit a1c6467bd90905d52cf8f6162b60907f8e98a704)

* [BOLT] Skip PLT search for zero-value weak reference symbols (#69136)

Take a common weak reference pattern for example
```
    __attribute__((weak)) void undef_weak_fun();

      if (&undef_weak_fun)
        undef_weak_fun();
```

In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.

(cherry picked from commit 6c8933e1a095028d648a5a26aecee0f569304dd0)

* [AArch64] Don't replace dst of SWP instructions with (X|W)ZR (#102139)

This change updates the AArch64DeadRegisterDefinition pass to ensure it
does not replace the destination register of a SWP instruction with the
zero register when its value is unused. This is necessary to ensure that
the ordering of such instructions in relation to DMB.LD barries adheres
to the definitions of the AArch64 Memory Model.

The memory model states the following (ARMARM version DDI 0487K.a
§B2.3.7):
```
Barrier-ordered-before

An effect E1 is Barrier-ordered-before an effect E2 if one of the following applies:
[...]
* All of the following apply:
- E1 is a Memory Read effect.
- E1 is generated by an instruction whose destination register is not WZR or XZR.
- E1 appears in program order before E3.
- E3 is either a DMB LD effect or a DSB LD effect.
- E3 appears in program order before E2.
```

Prior to this change, by replacing the destination register of such SWP
instruction with WZR/XZR, the ordering relation described above was
incorrectly removed from the generated code.

The new behaviour is ensured in this patch by adding the relevant
`SWP[L](B|H|W|X)` instructions to list in the `atomicReadDroppedOnZero`
predicate, which already covered the `LD<Op>` instructions that are
subject to the same effect.

Fixes #68428.

(cherry picked from commit beb37e2e22b549b361be7269a52a3715649e956a)

* [clang][modules] Enable built-in modules for the upcoming Apple releases (#102239)

The upcoming Apple SDK releases will support the clang built-in headers
being in the clang built-in modules: stop passing
-fbuiltin-headers-in-system-modules for those SDK versions.

(cherry picked from commit 961639962251de7428c3fe93fa17cfa6ab3c561a)

* [Driver] Fix a warning

This patch fixes:

  clang/lib/Driver/ToolChains/Darwin.cpp:2937:3: error: default label
  in switch which covers all enumeration values
  [-Werror,-Wcovered-switch-default]

(cherry picked from commit 0f1361baf650641a59aaa1710d7a0b7b02f2e56d)

* [AIX]export function descriptor symbols related to template functions. (#101920)

This fixes regressions caused by
https://github.com/llvm/llvm-project/pull/97526

After that patch, all undefined references to DS symbol are removed.
This makes DS symbols(for template functions) have no reference in some
cases. So extract_symbols.py does not export these DS symbols for these
cases.

On AIX, exporting the function descriptor depends on references to the
function descriptor itself and the function entry symbol.

Without this fix, on AIX, we get:
```
rtld: 0712-001 Symbol _ZN4llvm15SmallVectorBaseIjE13mallocForGrowEPvmmRm was referenced
      from module llvm-project/build/unittests/Passes/Plugins/TestPlugin.so(), but a runtime definition
            of the symbol was not found.
```

(cherry picked from commit 396343f17b1182ff8ed698beac3f9b93b1d9dabd)

* [clang-format] Fix a bug in annotating CastRParen (#102261)

Fixes #102102.

(cherry picked from commit 8c7a038f9029c675f2a52ff5e85f7b6005ec7b3e)

* [clang] Fix crash when #embed used in a compound literal (#102304)

Fixes https://github.com/llvm/llvm-project/issues/102248

(cherry picked from commit 3606d69d0b57dc1d23a4362e376e7ad27f650c27)

* [AMDGPU] Fix folding clamp into pseudo scalar instructions (#100568)

Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp
into a pseudo scalar instruction can cause issues due to a change in
regbank. We fix this with a copy.

(cherry picked from commit 817cd726454f01e990cd84e5e1d339b120b5ebaa)

* Revert "[LLVM] Silence compiler-rt warning in runtimes build (#99525)"

This patch broke LLVM Flang build on Windows. PR #100202
This reverts commit f6f88f4b99638821af803d1911ab6a7dac04880b.

(cherry picked from commit 73d862e478738675f5d919c6a196429acd7b5f50)

* [TBAA] Do not rewrite TBAA if exists, always null out `!tbaa.struct`

Retrieve `!tbaa` metadata via `!tbaa.struct` in `adjustForAccess`
unless it already exists, as struct-path aware `MDNodes` emitted
via `new-struct-path-tbaa` may be leveraged. As `!tbaa.struct`
carries memcpy padding semantics among struct fields and `!tbaa`
is already meant to aid to alias semantics, it should be possible
to zero out `!tbaa.struct` once the memcpy has been simplified.
`SROA/tbaa-struct.ll` test has gone out of scope, as `!tbaa` has
already replaced `!tbaa.struct` in SROA.

Fixes: https://github.com/llvm/llvm-project/issues/95661.

* [NFC][llvm][support] rename INFINITY in regcomp (#101758)

since C23 this macro is defined by float.h, which clang implements in
it's float.h since #96659 landed.

However, regcomp.c in LLVMSupport happened to define it's own macro with
that name, leading to problems when bootstrapping. This change renames
the offending macro.

(cherry picked from commit 899f648866affd011baae627752ba15baabc2ef9)

* [ELF] .llvm.call-graph-profile: support CREL

https://reviews.llvm.org/D105217 added RELA support. This patch adds
CREL support.

(cherry picked from commit 0766a59be3256e83a454a089f01215d6c7f94a48)

* [ELF] scanRelocations: support .crel.eh_frame

Follow-up to #98115. For EhInputSection, RelocationScanner::scan calls
sortRels, which doesn't support the CREL iterator. We should set
supportsCrel to false to ensure that the initial_location fields in
.eh_frame FDEs are relocated.

(cherry picked from commit a821fee312d15941174827a70cb534c2f2fe1177)

* Revert "demangle function names in trace files (#87626)"

This reverts commit 0fa20c55b58deb94090985a5c5ffda4d5ceb3cd1.

Storing raw symbol names is generally preferred in profile files.
Demangling might lose information. Language frontends might use
demangling schemes not supported by LLVMDemangle
(https://github.com/llvm/llvm-project/issues/45901#issuecomment-2008686663).
In addition, calling `demangle` for each function has a significant
performance overhead (#102222).

I believe that even if we decide to provide a producer-side demangling,
it would not be on by default.

Pull Request: https://github.com/llvm/llvm-project/pull/102274

(cherry picked from commit 72b73e23b6c36537db730ebea00f92798108a6e5)

* [AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (#102105)

The code-generator is currently not able to handle scalable vectors of
<vscale x 1 x eltty>. The usual "fix" for this until it is supported is
to mark the costs of loads/stores with an invalid cost, preventing the
vectorizer from vectorizing at those factors. But on rare occasions
loops do not contain load/stores, only reductions.

So whilst this is still unsupported return an invalid cost to avoid
selecting vscale x 1 VFs. The cost of a reduction is not currently used
by the vectorizer so this adds the cost to the add/mul/and/or/xor or
min/max that should feed the reduction. It includes reduction costs
too, for completeness. This change will be removed when code-generation
for these types is sufficiently reliable.

Fixes #99760

(cherry picked from commit 0b745a10843fc85e579bbf459f78b3f43e7ab309)

* [clang] Wire -fptrauth-returns to "ptrauth-returns" fn attribute. (#102416)

We already ended up with -fptrauth-returns, the feature macro, the lang
opt, and the actual backend lowering.

The only part left is threading it all through PointerAuthOptions, to
drive the addition of the "ptrauth-returns" attribute to generated
functions.
While there, do minor cleanup on ptrauth-function-attributes.c.

This also adds ptrauth_key_return_address to ptrauth.h.

(cherry picked from commit 2eb6e30fe83ccce3cf01e596e73fa6385facd44b)

* [lldb] Move definition of SBSaveCoreOptions dtor out of header (#102539)

This class is technically not usable in its current state. When you use
it in a simple C++ project, your compiler will complain about an
incomplete definition of SaveCoreOptions. Normally this isn't a problem,
other classes in the SBAPI do this. The difference is that
SBSaveCoreOptions has a default destructor in the header, so the
compiler will attempt to generate the code for the destructor with an
incomplete definition of the impl type.

All methods for every class, including constructors and destructors,
must have a separate implementation not in a header.

(cherry picked from commit 101cf540e698529d3dd899d00111bcb654a3c12b)

* [Clang] Define __cpp_pack_indexing (#101956)

Following the discussion on #101448 this defines
`__cpp_pack_indexing`. Since pack indexing is currently
supported in all language modes, the feature test macro
is also defined in all language modes.

(cherry picked from commit c65afad9c58474a784633314e945c874ed06584a)

* workflows/release-binaries-all: Pass secrets on to release-binaries workflow (#101866)

A called workflow does not have access to secrets by default, so we need
to explicitly pass any secret that we want to use.

(cherry picked from commit 1fb1a5d8e2c5a0cbaeb39ead68352e5e55752a6d)

* [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761)

This PR is the first step in improving the situation for `clang-cl`
detailed in [this LLVM Discourse
thread](https://discourse.llvm.org/t/clang-cl-exe-support-for-c-modules/72257/28).
There has been some work done in #89772. I believe this is somewhat
orthogonal.

This is a work-in-progress; the functionality has only been tested with
the [basic 'Hello World'
example](https://clang.llvm.org/docs/StandardCPlusPlusModules.html#quick-start),
and proper test cases need to be written. I'd like some thoughts on
this, thanks!

Partially resolves #64118.

(cherry picked from commit bd576fe34285c4dcd04837bf07a89a9c00e3cd5e)

* workflows: Fix permissions for release-sources job (#100750)

For reusable workflows, the called workflow cannot upgrade it's
permissions, and since the default permission is none, we need to
explicitly declare 'contents: read' when calling the release-sources
workflow.

Fixes the error:
The workflow is requesting 'contents: read', but is only allowed
'contents: none'.

(cherry picked from commit 82c2259aeb87f5cb418decfb6a1961287055e5d2)

* [Arm][AArch64][Clang] Respect function's branch protection attributes. (#101978)

Default attributes assigned to all functions according to the command
line parameters. Some functions might have their own attributes and we
need to set or remove attributes accordingly.
Tests are updated to test this scenarios too.

(cherry picked from commit 9e9fa00dcb9522db3f78d921eda6a18b9ee568bb)

* [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338)

Remove `XFAIL: LIBCXX-AIX-FIXME` from lit test `transform.pass.cpp` now
that AIX system call `wcsxfrm`/`wcsxfrm_l` is fixed in AIX 7.2.5.8 and
7.3.2.2 and buildbot machines have been upgraded.

Backported from commit cb5912a71061c6558bd4293596dcacc1ce0ca2f6

* [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC (#102755)

Three `llvm-exegesis` tests
```
  LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/DefinitionFillsCompletely
  LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/MultipleDefinitions
  LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/OneDefinition
```
`FAIL` on Linux/sparc64 like
```
llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp:68: Failure
Expected equality of these values:
  SharedMemoryMapping[I]
    Which is: '\0'
  ExpectedValue[I]
    Which is: '\xAA' (170)
```
It seems like this test only works on little-endian hosts: three
sub-tests are already disabled on powerpc and s390x (both big-endian),
and the fourth is additionally guarded against big-endian hosts (making
the other guards unnecessary).

However, since it's not been analyzed if this is really an endianess
issue, this patch disables the whole test on powerpc and s390x as before
adding sparc to the mix.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit a417083e27b155dc92b7f7271c0093aee0d7231c)

* [clang-format] Fix a serious bug in `git clang-format -f` (#102629)

With the --force (or -f) option, git-clang-format wipes out input files
excluded by a .clang-format-ignore file if they have unstaged changes.

This patch adds a hidden clang-format option --list-ignored that lists
such excluded files for git-clang-format to filter out.

Fixes #102459.

(cherry picked from commit 986bc3d0719af653fecb77e8cfc59f39bec148fd)

* [lldb] Fix crash when adding members to an "incomplete" type (#102116)

This fixes a regression caused by delayed type definition searching
(#96755 and friends): If we end up adding a member (e.g. a typedef) to a
type that we've already attempted to complete (and failed), the
resulting AST would end up inconsistent (we would start to "forcibly"
complete it, but never finish it), and importing it into an expression
AST would crash.

This patch fixes this by detecting the situation and finishing the
definition as well.

(cherry picked from commit 57cd1000c9c93fd0e64352cfbc9fbbe5b8a8fcef)

* [clang] Implement -fptrauth-auth-traps. (#102417)

This provides -fptrauth-auth-traps, which at the frontend level only
controls the addition of the "ptrauth-auth-traps" function attribute.

The attribute in turn controls various aspects of backend codegen, by
providing the guarantee that every "auth" operation generated will trap
on failure.

This can either be delegated to the hardware (if AArch64 FPAC is known
to be available), in which case this attribute doesn't change codegen.
Otherwise, if FPAC isn't available, this asks the backend to emit
additional instructions to check and trap on auth failure.

(cherry picked from commit d179acd0484bac30c5ebbbed4d29a4734d92ac93)

* Revert "[libc++][math] Fix undue overflowing of `std::hypot(x,y,z)` (#93350)"

This reverts commit 9628777479a970db5d0c2d0b456dac6633864760.

More details in https://github.com/llvm/llvm-project/pull/93350, but
this broke the PowerPC sanitizer bots.

(cherry picked from commit 1031335f2ee1879737576fde3a3425ce0046e773)

* [libc++][math] Fix undue overflowing of `std::hypot(x,y,z)` (#100820)

This is in relation to mr #93350. It was merged to main, but reverted
because of failing sanitizer builds on PowerPC.

The fix includes replacing the hard-coded threshold constants (e.g.
`__overflow_threshold`) for different floating-point sizes by a general
computation using `std::ldexp`. Thus, it should now work for all architectures.
This has the drawback of not being `constexpr` anymore as `std::ldexp`
is not implemented as `constexpr` (even though the standard mandates it
for C++23).

Closes #92782

(cherry picked from commit 72825fde03aab3ce9eba2635b872144d1fb6b6b2)

* [C++20] [Modules] Don't diagnose duplicated implicit decl in multiple named modules (#102423)

Close https://github.com/llvm/llvm-project/issues/102360
Close https://github.com/llvm/llvm-project/issues/102349

http://eel.is/c++draft/basic.def.odr#15.3 makes it clear that the
duplicated deinition are not allowed to be attached to named modules.

But we need to filter the implicit declarations as user can do nothing
about it and the diagnostic message is annoying.

(cherry picked from commit e72d956b99e920b0fe2a7946eb3a51b9e889c73c)

* [AIX] Revert `#pragma mc_func` check (#102919)

https://github.com/llvm/llvm-project/pull/99888 added a specific
diagnostic for `#pragma mc_func` on AIX. There are some disagreements
on:

1. If the check should be on by default. Leaving the check off by
default is dangerous, since it is difficult to be aware of such a check.
Turning it on by default at the moment causes build failures on AIX. See
https://github.com/llvm/llvm-project/pull/101336 for more details.
2. If the check can be made more general. See
https://github.com/llvm/llvm-project/pull/101336#issuecomment-2269283906.

This PR reverts this check from `main` so we can flush out these
disagreements.

(cherry picked from commit 123b6fcc70af17d81c903b839ffb55afc9a9728f)

* [Clang][Sema] Make UnresolvedLookupExprs in class scope explicit specializations instantiation dependent (#100392)

A class member named by an expression in a member function that may instantiate to a static _or_ non-static member is represented by a `UnresolvedLookupExpr` in order to defer the implicit transformation to a class member access expression until instantiation. Since `ASTContext::getDecltypeType` only creates a `DecltypeType` that has a `DependentDecltypeType` as its canonical type when the operand is instantiation dependent, and since we do not transform types unless they are instantiation dependent, we need to mark the `UnresolvedLookupExpr` as instantiation dependent in order to correctly build a `DecltypeType` using the expression as its operand with a `DependentDecltypeType` canonical type. Fixes #99873.

(cherry picked from commit 55ea36002bd364518c20b3ce282640c920697bf7)

* [libc++] Use a different smart ptr type alias (#102089)

The `_SP` type is used by some C libraries and this alias could conflict
with it.

(cherry picked from commit 7951673d408ee64744d0b924a49db78e8243d876)

* [CodeGen][ARM64EC] Define hybrid_patchable EXP thunk symbol as a function. (#102898)

This is needed for MSVC link.exe to generate redirection metadata for hybrid patchable thunks.

(cherry picked from commit d550ada5ab6cd6e49de71ac4c9aa27ced4c11de0)

* [PPC][AIX] Save/restore r31 when using base pointer (#100182)

When the base pointer r30 is used to hold the stack pointer, r30 is
spilled in the prologue. On AIX registers are saved from highest to
lowest, so r31 also needs to be saved.

Fixes https://github.com/llvm/llvm-project/issues/96411

(cherry picked from commit d07f106e512c08455b76cc1889ee48318e73c810)

* [clang-format] Fix annotation of braces enclosing stringification (#102998)

Fixes #102937.

(cherry picked from commit ee2359968fa307ef45254c816e14df33374168cd)

* [clang][AArch64] Point the nofp ABI check diagnostics at the callee (#103392)

... whereever we have the Decl for it, and even when we don't keep the
SourceLocation of it aimed at the call site.

Fixes: #102983
(cherry picked from commit 019ef522756886caa258daf68d877f84abc1b878)

* [libc++] Fix ambiguous constructors for std::complex and std::optional (#103409)

Fixes #101960

(cherry picked from commit 4d08bb11eea5907fa9cdfe4c7bc9d5c91e79c6a7)

* [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170)

Summary:
This was not forwarded properly as it would try to pass it to `nvlink`.

Fixes https://github.com/llvm/llvm-project/issues/100168

(cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43)

* [RISCV] Use experimental.vp.splat to splat specific vector length elements. (#101329)

Previously, llvm IR is hard to create a scalable vector splat with a
specific vector length, so we use riscv.vmv.v.x and riscv.vmv.v.f to do
this work. But the two rvv intrinsics needs strict type constraint which
can not support fixed vector types and illegal vector types. Using
vp.splat could preserve old functionality and also generate more
optimized code for vector types and illegal vectors.
This patch also fixes crash for getEVT not serving ptr types.

(cherry picked from commit 87af9ee870ad7ca93abced0b09459c3760dec891)

* [Hexagon] Do not optimize address of another function's block (#101209)

When the constant extender optimization pass encounters an instruction
that uses an extended address pointing to another function's block,
avoid adding the instruction to the extender list for the current
machine function.

Fixes https://github.com/llvm/llvm-project/issues/99714

(cherry picked from commit 68df06a0b2998765cb0a41353fcf0919bbf57ddb)

* [AArch64] Add GCS release notes

* Revert "[CGData] llvm-cgdata (#89884)"

This reverts commit d3fb41dddc11b0ebc338a3b9e6a5ab7288ff7d1d
and forward fix patches because of the issue explained in:
https://github.com/llvm/llvm-project/pull/89884#issuecomment-2244348117.

Revert "Fix tests for https://github.com/llvm/llvm-project/pull/89884
(#100061)"

This reverts commit 67937a3f969aaf97a745a45281a0d22273bff713.

Revert "Fix build break for https://github.com/llvm/llvm-project/pull/89884 (#100050)"

This reverts commit c33878c5787c128234d533ad19d672dc3eea19a8.

Revert "[CGData] Fix -Wpessimizing-move in CodeGenDataReader.cpp (NFC)"

This reverts commit 1f8b2b146141f3563085a1acb77deb50857a636d.

(cherry picked from commit 73d78973fe072438f0f73088f889c66845b2b51a)

* [AArch64] Adopt updated B16B16 target flags

The enablement of SVE/SME non-widening BFloat16 instructions was recently
changed in response to an architecture update, in which:
	- FEAT_SVE_B16B16 was weakened
	- FEAT_SME_B16B16 was introduced
New flags, 'sve-b16b16' and 'sme-b16b16' were introduced to replace the
existing 'b16b16'. This was acheived in the below two patches.
	- https://github.com/llvm/llvm-project/pull/101480
	- https://github.com/llvm/llvm-project/pull/102501
Ideally, the interface change introduced here will be valid in LLVM-19.
We do not see it necessary to back-port the entire change, but just to add
'sme-b16b16' and 'sve-b16b16' as aliases to the existing (and unchanged)
'b16b16' and 'sme2' flags which together cover all of these features.

The predication of Bf16 variants of svmin/svminnm and svmax/svmaxnm is also
fixed in this change.

* [libc++] Fix rejects-valid in std::span copy construction (#104500)

Trying to copy-construct a std::span from another std::span holding an
incomplete type would fail as we evaluate the SFINAE for the range-based
constructor. The problem was that we checked for __is_std_span after
checking for the range being a contiguous_range, which hard-errored
because of arithmetic on a pointer to incomplete type.

As a drive-by, refactor the whole test and format it.

Fixes #104496

(cherry picked from commit 99696b35bc8a0054e0b0c1a26e8dd5049fa8c41b)

* [clang-tidy] Fix crash in C language in readability-non-const-parameter (#100461)

Fix crash that happen when redeclaration got
different number of parameters than definition.

Fixes #100340

(cherry picked from commit a27f816fe56af9cc7f4f296ad6c577f6ea64349f)

* [AArch64] Add streaming-mode stack hazard optimization remarks (#101695)

Emit an optimization remark when objects in the stack frame may cause
hazards in a streaming mode function. The analysis requires either the
`aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag
to be set by the user, with the former flag taking precedence.

(cherry picked from commit a98a0dcf63f54c54c5601a34c9f8c10cde0162d6)

* [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605)

In C++23 anything can be constexpr, including a dtor of a class whose
members and bases don't have constexpr dtors. Avoid early triggering of
vtable instantiation int this case.

Fixes https://github.com/llvm/llvm-project/issues/102293

(cherry picked from commit d469794d0cdfd2fea50a6ce0c0e33abb242d744c)

* [OpenMP][AArch64] Fix branch protection in microtasks (#102317)

Start __kmp_invoke_microtask with PACBTI in order to identify the
function as a valid branch target. Before returning, SP is
authenticated.
Also add the BTI and PAC markers to z_Linux_asm.S.

With this patch, libomp.so can now be generated with DT_AARCH64_BTI_PLT
when built with -mbranch-protection=standard.

The implementation is based on the code available in compiler-rt.

(cherry picked from commit 0aa22dcd2f6ec5f46b8ef18fee88066463734935)

* [clang][driver] `TY_ModuleFile` should be a 'CXX' file type

* [Mips] Fix fast isel for i16 bswap. (#103398)

We need to mask the SRL result to 8 bits before ORing in the SLL. This
is needed in case bits 23:16 of the input aren't zero. They will have
been shifted into bits 15:8.

We don't need to AND the result with 0xffff. It's ok if the upper 16
bits of the register are garbage.

Fixes #103035.

(cherry picked from commit ebe7265b142f370f0a563fece5db22f57383ba2d)

* Add some brief LLVM 19 release notes for Pointer Authentication ABI support.

* release/19.x: [BOLT] Fix relocations handling

Backport https://github.com/llvm/llvm-project/commit/097ddd3565f830e6cb9d0bb8ca66844b7f3f3cbb

* [AArch64] Add a check for invalid default features (#104435)

This adds a check that all ExtensionWithMArch which are marked as
implied features for an architecture are also present in the list of
default features. It doesn't make sense to have something mandatory but
not on by default.

There were a number of existing cases that violated this rule, and some
changes to which features are mandatory (indicated by the Implies
field).

This resulted in a bug where if a feature was marked as `Implies` but
was not added to `DefaultExt`, then for `-march=base_arch+nofeat` the
Driver would consider `feat` to have never been added and therefore
would do nothing to disable it (no `-target-feature -feat` would be
added, but the backend would enable the feature by default because of
`Implies`). See
clang/test/Driver/aarch64-negative-modifiers-for-default-features.c.

Note that the processor definitions do not respect the architecture
DefaultExts. These apply only when specifying `-march=<some architecture
version>`. So when a feature is moved from `Implies` to `DefaultExts` on
the Architecture definition, the feature needs to be added to all
processor definitions (that are based on that architecture) in order to
preserve the existing behaviour. I have checked the TRMs for many cases
(see specific commit messages) but in other cases I have just kept the
current behaviour and not tried to fix it.

* [GlobalISel] Bail out early for big-endian (#103310)

If we continue through the function we can currently hit crashes. We can
bail out early and fall back to SDAG.

Fixes #103032

(cherry picked from commit 05d17a1c705e1053f95b90aa37d91ce4f94a9287)

* [LLD] [MinGW] Recognize the -rpath option (#102886)

GNU ld silently accepts the -rpath option for Windows targets, as a
no-op.

This has lead to some build systems (and users) passing this option
while building for Windows/MinGW, even if Windows doesn't have any
concept like rpath.

Older versions of Conan did include -rpath in the pkg-config files it
generated, see e.g.

https://github.com/conan-io/conan/blob/17c58f0c61931f9de218ac571cd97a8e0befa68e/conans/client/generators/pkg_config.py#L104-L114
and
https://github.com/conan-io/conan/blob/17c58f0c61931f9de218ac571cd97a8e0befa68e/conans/client/build/compiler_flags.py#L26-L34
- and see https://github.com/mstorsjo/llvm-mingw/issues/300 for user
reports about this issue.

Recognize the option in LLD for MinGW targets, to improve drop-in
compatibility compared to GNU ld, but produce a warning to alert users
that the option really has no effect for these targets.

(cherry picked from commit 69f76c782b554a004078af6909c19a11e3846415)

* [C++23] Fix infinite recursion (Clang 19.x regression) (#104829)

d469794d0cdfd2fea50a6ce0c0e33abb242d744c was fixing an issue with
triggering vtable instantiations, but it accidentally introduced
infinite recursion when the type to be checked is the same as the type
used in a base specifier or field declaration.

Fixes #104802

(cherry picked from commit 435cb0dc5eca08cdd8d9ed0d887fa1693cc2bf33)

* Reland [C++20] [Modules] [Itanium ABI] Generate the vtable in the mod… (#102287)

Reland https://github.com/llvm/llvm-project/pull/75912

The differences of this PR between
https://github.com/llvm/llvm-project/pull/75912 are:

- Fixed a regression in `Decl::isInAnotherModuleUnit()` in DeclBase.cpp
pointed by @mizvekov and add the corresponding test.
- Fixed the regression in windows
https://github.com/llvm/llvm-project/issues/97447. The changes are in
`CodeGenModule::getVTableLinkage` from
`clang/lib/CodeGen/CGVTables.cpp`. According to the feedbacks from MSVC
devs, the linkage of vtables won't affected by modules. So I simply
skipped the case for MSVC.

Given this is more or less fundamental to the use of modules. I hope we
can backport this to 19.x.

(cherry picked from commit 847f9cb0e868c8ec34f9aa86fdf846f8c4e0388b)

* [libunwind] Add GCS support for AArch64 (#99335)

AArch64 GCS (Guarded Control Stack) is similar enough to CET that we can
re-use the existing code that is guarded by _LIBUNWIND_USE_CET, so long
as we also add defines to locate the GCS stack and pop the entries from
it. We also need the jumpto function to exit using br instead of ret, to
prevent it from popping the GCS stack.

GCS support is enabled using the LIBUNWIND_ENABLE_GCS cmake option. This
enables -mbranch-protection=standard, which enables GCS. For the places
we need to use GCS instructions we use the target attribute, as there's
not a command-line option to enable a specific architecture extension.

(cherry picked from commit b32aac4358c1f6639de7c453656cd74fbab75d71)

* [libunwind] Be more careful about enabling GCS (#101973)

We need both GCS to be enabled by the compiler (which we do by checking
if __ARM_FEATURE_GCS_DEFAULT is defined) and for arm_acle.h to define
the GCS intrinsics. Check the latter by checking if _CHKFEAT_GCS is
defined.

(cherry picked from commit c649194a71b47431f2eb2e041435d564e3b51072)

* [libunwind] Fix problems caused by combining BTI and GCS (#102322)

The libunwind assembly files need adjustment in order to work correctly
when both BTI and GCS are both enabled (which will be the case when
using -mbranch-protection=standard):
* __libunwind_Registers_arm64_jumpto can't use br to jump to the return
location, instead we need to use gcspush then ret.
* Because we indirectly call __libunwind_Registers_arm64_jumpto it needs
to start with bti jc.
 * We need to set the GCS GNU property bit when it's enabled.

---------

Co-authored-by: Daniel Kiss <daniel.kristof.kiss@gmail.com>
(cherry picked from commit 39529107b46032ef0875ac5b809ab5b60cd15a40)

* Bump version to 19.1.0-rc3

* [sanitizer_common] Make sanitizer_linux.cpp kernel_stat* handling Linux-specific

fcd6bd5587cc376cd8f43b60d1c7d61fdfe0f535 broke the Solaris/sparcv9 buildbot:
```
compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:39:14: fatal error: 'asm/unistd.h' file not found
   39 | #    include <asm/unistd.h>
      |              ^~~~~~~~~~~~~~
```
That section should have been Linux-specific in the first place, which is
what this patch does.

Tested on sparcv9-sun-solaris2.11.

(cherry picked from commit 16e9bb9cd7f50ae2ec7f29a80bc3b95f528bfdbf)

* [clang][modules] Built-in modules are not correctly enabled for Mac Catalyst (#104872)

Mac Catalyst is the iOS platform, but it builds against the macOS SDK
and so it needs to be checking the macOS SDK version instead of the iOS
one. Add tests against a greater-than SDK version just to make sure this
works beyond the initially supporting SDKs.

(cherry picked from commit b9864387d9d00e1d4888181460d05dbc92364d75)

* [SPARC] Remove assertions in printOperand for inline asm operands (#104692)

Inline asm operands could contain any kind of relocation, so remove the
checks.

Fixes https://github.com/llvm/llvm-project/issues/103493

(cherry picked from commit 576b7a781aac6b1d60a72248894b50e565e9185a)

* Add AIX/PPC Clang/LLVM release notes for LLVM 19.

* use default intrinsic attrs for BPF packet loads

The BPF packet load intrinsics lost attribute WillReturn due to 0b20c30. The attribute loss causes excessive bitshifting, resulting in previously working programs failing the BPF verifier due to instruction/complexity limits.

cherry picked only the BPF changes from 99a10f1

Signed-off-by: Bryce Kahle <bryce.kahle@datadoghq.com>

* [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395)

Prevent operand folding from inlining constants into pseudo scalar
transcendental f16 instructions.
However still allow literal constants.

(cherry picked from commit fc6300a5f7ef430e4ec86d16be0b146de7fbd16b)

* [DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924)

In visitFREEZE we have been collecting a set/vector of
MaybePoisonOperands that later was iterated over, applying a freeze to
those operands. However, C-level fuzzy testing has discovered that the
recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
the MaybePoisonOperands vector to be replaced when replacing an earlier
operand. That would then turn up as
   Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
              "Operand is DELETED_NODE!"' failed.
failures when trying to freeze those later operands.

So we need to make sure that the vector with MaybePoisonOperands is
mutated as well when needed. Or as the solution used in this patch, make
sure to keep track of operand numbers that should be frozen instead of
having a vector of SDValues. And then we can refetch the operands while
iterating over operand numbers.

The problem was seen after adding SELECT_CC to the set of operations
including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
guess that this could happen for other operations as well for which we
allow multiple maybe poison operands.

(cherry picked from commit 278fc8efdf004a1959a31bb4c208df5ee733d5c8)

* [X86] Use correct fp immediate types in _mm_set_ss/sd

Avoids implicit sint_to_fp which wasn't occurring on strict fp codegen

Fixes #104848

(cherry picked from commit 6dcce422ca06601f2b00e85cc18c745ede245ca6)

* [clang-format] Don't insert a space between :: and * (#105043)

Also, don't insert a space after ::* for method pointers.

See
https://github.com/llvm/llvm-project/pull/86253#issuecomment-2298404887.

Fixes #100841.

(cherry picked from commit 714033a6bf3a81b1350f969ddd83bcd9fbb703e8)

* [ConstraintElim] Fix miscompilation caused by PR97974 (#105790)

Fixes https://github.com/llvm/llvm-project/issues/105785.

(cherry picked from commit 85b6aac7c25f9d2a976a76045ace1e7afebb5965)

* [MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data (REAPPLIED)

This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should

Reapplied with codegen fix for scatter-schedule.ll

Fixes #105675

(cherry picked from commit cf6cd1fd67356ca0c2972992928592d2430043d2)

* [DwarfEhPrepare] Assign dummy debug location for more inserted _Unwind_Resume calls (#105513)

Similar to the fix for #57469, ensure that the other `_Unwind_Resume`
call emitted by DwarfEHPrepare has a debug location if needed.

This fixes https://github.com/nbdd0121/unwinding/issues/34.

(cherry picked from commit e76db25832d6ac2d3a36769b26f982d9dee4b346)

* [clangd] Add clangd 19 release notes

* Restrict LLVM_TARGETS_TO_BUILD in Windows release packaging (#106059)

When including all targets, some files become too large for the NSIS
installer to handle.

Fixes #101994

(cherry picked from commit 2a28df66dc3f7ff5b6233241837854acefb68d77)

* [AArch64] Make apple-m4 armv8.7-a again (from armv9.2-a).  (#106312)

This is a partial revert of c66e1d6f3429.  Even though that
allowed us to declare v9.2-a support without picking up SVE2
in both the backend and the driver, the frontend itself still
enabled SVE via the arch version's default extensions.

Avoid that by reverting back to v8.7-a while we look into
longer-term solutions.

(cherry picked from commit e5e38ddf1b8043324175868831da21e941c00aff)

* workflows/release-binaries: Remove .git/config file from artifacts (#106310)

The .git/config file contains an auth token that can be leaked if the
.git directory is included in a workflow artifact.

(cherry picked from commit ef50970204384643acca42ba4c7ca8f14865a0c2)

* [clang] Install scan-build-py into plain "lib" directory (#106612)

Install scan-build-py modules into the plain `lib` directory,
without LLVM_LIBDIR_SUFFIX appended, to match the path expected
by `intercept-build` executable.  This fixes the program being unable
to find its modules.  Using unsuffixed path makes sense here, since
Python modules are not subject to multilib.

This change effectively reverts 1334e129a39cb427e7b855e9a711a3e7604e50e5.
The commit in question changed the path without a clear justification
("does not respect the given prefix") and the Python code was never
modified to actually work with the change.

Fixes #106608

(cherry picked from commit 0c4cf79defe30d43279bf4526cdf32b6c7f8a197)

* [llvm][CodeGen] Added missing initialization failure information for window scheduler (#99449)

Added missing initialization failure information for window scheduler.

* [llvm][CodeGen] Added a new restriction for II by pragma in window scheduler (#99448)

Added a new restriction for window scheduling.
Window scheduling is disabled when llvm.loop.pipeline.initiationinterval
is set.

* [llvm][CodeGen] Fixed a bug in stall cycle calculation for window scheduler (#99451)

Fixed a bug in stall cycle calculation.
When a register defined by an instruction in the current iteration is
used by an instruction in the next iteration, we have modified the
number of stall cycle that need to be inserted.

* [llvm][CodeGen] Fixed max cycle calculation with zero-cost instructions for window scheduler (#99454)

We discovered some scheduling failures occurring when zero-cost
instructions were involved. This issue will be addressed by this patch.

* [llvm][CodeGen] Address the issue of multiple resource reservations In window scheduling (#101665)

Address the issue of multiple resource reservations in window
scheduling.

* [analyzer] Limit `isTainted()` by skipping complicated symbols (#105493)

As discussed in

https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10

Some `isTainted()` queries can blow up the analysis times, and
effectively halt the analysis under specific workloads.

We don't really have the time now to do a caching re-implementation of
`isTainted()`, so we need to workaround the case.

The workaround with the smallest blast radius was to limit what symbols
`isTainted()` does the query (by walking the SymExpr). So far, the
threshold 10 worked for us, but this value can be overridden using the
"max-tainted-symbol-complexity" config value.

This new option is "deprecated" from the getgo, as I expect this issue
to be fixed within the next few months and I don't want users to
override this value anyways. If they do, this message will let them know
that they are on their own, and the next release may break them (as we
no longer recognize this option if we drop it).

Mitigates #89720

CPP-5414

(cherry picked from commit 848658955a9d2d42ea3e319d191e2dcd5d76c837)

* [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081)

A crash was happening when both ObjC Category Merging and Relative
method lists were enabled.

ObjC Category Merging creates new data sections and adds them by calling
`addInputSection`. `addInputSection` uses the symbols within the added
section to determine which container to actually add the section to.

The issue is that ObjC Category merging is calling `addInputSection`
before actually adding the relevant symbols the the added section. This
causes `addInputSection` to add the `InputSection` to the wrong
container, eventually resulting in a crash.

To fix this, we ensure that ObjC Category Merging calls
`addInputSection` only after the symbols have been added to the
`InputSection`.

(cherry picked from commit 0df91893efc752a76c7bbe6b063d66c8a2fa0d55)

* [PowerPC] Respect endianness when bitcasting to fp128 (#95931)

Fixes #92246

Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when
encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`.
by inserting a missing swap of the arguments based on endianness.

### Current behaviour:
**fp128**
bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi
BUILD_FP128 %lo %hi => MTVSRDD %hi %lo

**v2i64**
bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo
BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi

(cherry picked from commit 408d82d352eb98e2d0a804c66d359cd7a49228fe)

* Add release note about ABI implementation changes for _BitInt on Arm

* [AMDGPU] Add GFX12 test coverage for vmcnt flushing in loop headers (#105548)

(cherry picked from commit 61194617ad7862f144e0f6db34175553e8c34763)

* [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (#105549)

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.

(cherry picked from commit 5506831f7bc8dc04ebe77f4d26940007bfb4ab39)

* [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550)

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

(cherry picked from commit fa2dccb377d0b712223efe5b62e5fc633580a9e6)

* [libunwind] Stop installing the mach-o module map (#105616)

libunwind shouldn't know that compact_unwind_encoding.h is part of a
MachO module that it doesn't own. Delete the mach-o module map, and let
whatever is in charge of the mach-o directory be the one to say how its
module is organized and where compact_unwind_encoding.h fits in.

(cherry picked from commit 172c4a4a147833f1c08df1555f3170aa9ccb6cbe)

* [clang-format] Fix a misannotation of redundant r_paren as CastRParen (#105921)

Fixes #105880.

(cherry picked from commit 6bc225e0630f28e83290a43c3d9b25b057fc815a)

* [clang-format] Fix a misannotation of less/greater as angle brackets (#105941)

Fixes #105877.

(cherry picked from commit 0916ae49b89db6eb9eee9f6fee4f1a65fd9cdb74)

* [PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453)

These builtins are currently returning CR0 which will have the format
[0, 0, flag_true_if_saved, XER].
We only want to return flag_true_if_saved. This patch adds a shift to
remove the XER bit before returning.

(cherry picked from commit 327edbe07ab4370ceb20ea7c805f64950871d835)

* [clang][AArch64] Add SME2.1 feature macros (#105657)

(cherry picked from commit 2617023923175b0fd2a8cb94ad677c061c01627f)

* [Clang][Sema] Revisit the fix for the lambda within a type alias template decl (#89934)

In the last patch #82310, we used template depths to tell if such alias
decls contain lambdas, which is wrong because the lambda can also appear
as a part of the default argument, and that would make
`getTemplateInstantiationArgs` provide extra template arguments in
undesired contexts. This leads to issue #89853.

Moreover, our approach
for https://github.com/llvm/llvm-project/issues/82104 was sadly wrong.
We tried to teach `DeduceReturnType` to consider alias template
arguments; however, giving these arguments in the context where they
should have been substituted in a `TransformCallExpr` call is never
correct.

This patch addresses such problems by using a `RecursiveASTVisitor` to
check if the lambda is contained by an alias `Decl`, as well as
twiddling the lambda dependencies - we should also build a dependent
lambda expression if the surrounding alias template arguments were
dependent.

Fixes #89853
Fixes #102760
Fixes #105885

(cherry picked from commit b412ec5d3924c7570c2c96106f95a92403a4e09b)

* [libc++] Add missing include to three_way_comp_ref_type.h

We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`.

rdar://134425695
(cherry picked from commit 0df78123fdaed39d5135c2e4f4628f515e6d549d)

* [compiler-rt] Fix definition of `usize` on 32-bit Windows

32-bit Windows uses `unsigned int` for uintptr_t and size_t.
Commit 18e06e3e2f3d47433e1ed323b8725c76035fc1ac changed uptr to
unsigned long, so it no longer matches the real size_t/uintptr_t and
therefore the current definition of usize result in:
`error C2821: first formal parameter to 'operator new' must be 'size_t'`

However, the real problem is that uptr is wrong to work around the fact
that we have local SIZE_T and SSIZE_T typedefs that trample on the
basetsd.h definitions of the same name and therefore need to match
exactly. Unlike size_t/ssize_t the uppercase ones always use unsigned
long (even on 32-bit).

This commit works around the build breakage by keeping the existing
definitions of uptr/sptr and just changing usize. A follow-up change
will attempt to fix this properly.

Fixes: https://github.com/llvm/llvm-project/issues/101998

Reviewed By: mstorsjo

Pull Request: https://github.com/llvm/llvm-project/pull/106151

(cherry picked from commit bb27dd853a713866c025a94ead8f03a1e25d1b6e)

* [clang-format] Fix misalignments of pointers in angle brackets (#106013)

Fixes #105898.

(cherry picked from commit 656d5aa95825515a55ded61f19d41053c850c82d)

* [clang-format] js handle anonymous classes (#106242)

Addresses a regression in JavaScript when formatting anonymous classes.

---------

Co-authored-by: Owen Pan <owenpiano@gmail.com>
(cherry picked from commit 77d63cfd18aa6643544cf7acd5ee287689d54cca)

* Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226)

This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb.

Fixes: https://github.com/llvm/llvm-project/issues/100212
(cherry picked from commit 030ee841a9c9fbbd6e7c001e751737381da01f7b)

Conflicts:
	clang/test/Driver/linker-wrapper-passes.c

* [clang-format] Revert "[clang-format][NFC] Delete TT_LambdaArrow (#70… (#105923)

…519)"

This reverts commit e00d32afb9d33a1eca48e2b041c9688436706c5b and adds a
test for lambda arrow SplitPenalty.

Fixes #105480.

* workflows/release-tasks: Pass required secrets to all called workflows (#106286)

Called workflows don't have access to secrets by default, so we need to
explicitly pass secrets that we use.

(cherry picked from commit 9d81e7e36e33aecdee05fef551c0652abafaa052)

* [C++20] [Modules] Don't insert class not in named modules to PendingEmittingVTables (#106501)

Close https://github.com/llvm/llvm-project/issues/102933

The root cause of the issue is an oversight in
https://github.com/llvm/llvm-project/pull/102287 that I didn't notice
that PendingEmittingVTables should only accept classes in named modules.

(cherry picked from commit 47615ff2347a8be429404285de3b1c03b411e7af)

* Revert "[clang] fix broken canonicalization of DeducedTemplateSpecializationType (#95202)"

This reverts commit 2e1ad93961a3f444659c5d02d800e3144acccdb4.

Reverting #95202 in the 19.x branch

Fixes #106182

The change in #95202 causes code to crash and there is
no good way to backport a fix for that as there are ABI-impacting
changes at play.
Instead we revert #95202 in the 19x branch, fixing the regression
and preserving the 18.x behavior (which is GCC's behavior)

https://github.com/llvm/llvm-project/pull/106335#discussion_r1735174841

* [analyzer] Add missing include <unordered_map> to llvm/lib/Support/Z3Solver.cpp (#106410)

Resolves #106361. Adding #include <unordered_map> to
llvm/lib/Support/Z3Solver.cpp fixes compilation errors for homebrew
build on macOS with Xcode 14.
https://github.com/Homebrew/homebrew-core/actions/runs/10604291631/job/29390993615?pr=181351
shows that this is resolved when the include is patched in (Linux CI
failure is due to unrelated timeout).

(cherry picked from commit fcb3a0485857c749d04ea234a8c3d629c62ab211)

* [RemoveDIs] Simplify spliceDebugInfo, fixing splice-to-end edge case (#105670)

Not quite NFC, fixes splitBasicBlockBefore case when we split before an
instruction with debug records (but without the headBit set, i.e., we are
splitting before the instruction but after the debug records that come before
it). splitBasicBlockBefore splices the instructions before the split point into
a new block. Prior to this patch, the debug records would get shifted up to the
front of the spliced instructions (as seen in the modified unittest - I believe
the unittest was checking erroneous behaviour). We instead want to leave those
debug records at the end of the spliced instructions.

The functionality of the deleted `else if` branch is covered by the remaining
`if` now that `DestMarker` is set to the trailing marker if `Dest` is `end()`.
Previously the "===" markers were sometimes detached, now we always detach
them and always reattach them.

Note: `deleteTrailingDbgRecords` only "unlinks" the tailing marker from the
block, it doesn't delete anything. The trailing marker is still cleaned up
properly inside the final `if` body with `DestMarker->eraseFromParent();`.

Part 1 of 2 needed for #105571

(cherry picked from commit f5815534d180c544bffd46f09c28b6fc334260fb)

* [libcxx] don't `#include <cwchar>` if wide chars aren't enabled (#99911)

Pull request #96032 unconditionall adds the `cwchar` include in the
`format` umbrella header. However support for wchar_t can be disabled in
the build system (LIBCXX_ENABLE_WIDE_CHARACTERS).

This patch guards against inclusion of `cwchar` in `format` by checking
the `_LIBCPP_HAS_NO_WIDE_CHARACTERS` define.

For clarity I've also merged the include header section that `cwchar`
was in with the one above as they were both guarded by the same `#if`
logic.

(cherry picked from commit ec56790c3b27df4fa1513594ca9a74fd8ad5bf7f)

* [clang-format] Correctly annotate braces in ObjC square brackets (#106654)

See
https://github.com/llvm/llvm-project/pull/88238#issuecomment-2316954781.

(cherry picked from commit e0f2368cdeb7312973a92fb2d22199d1de540db8)

* [Instrumentation] Fix EdgeCounts vector size in SetBranchWeights (#99064)

(cherry picked from commit 46a4132e167aa44d8ec7776262ce2a0e6d47de59)

* [builtins] Fix missing main() function in float16/bfloat16 support checks (#104478)

The CMake docs state that `check_c_source_compiles()` checks whether the
supplied code "can be compiled as a C source file and linked as an
executable (so it must contain at least a `main()` function)."

https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html

In practice, this command is a wrapper around `try_compile()`:

- https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/CheckCSourceCompiles.cmake#L54
- https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/Internal/CheckSourceCompiles.cmake#L101

When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/,
`CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `ST…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants