-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LAA] Refine stride checks for SCEVs during dependence analysis. #99577
Conversation
NOTE: This PR contains 1debdc1, which introduces a helper to get the stride as SCEV expression and is NFC. This can be submitted separately once we converge on the final version of the PR. Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Instead of getting the constant strides for both source and sink of a dependence, start by getting the SCEV for their strides, if the SCEVs are non-wrapping AddRecs or nullopt otherwise. Then proceed by checking the strides. If either source or sink are not strided (i.e. not a non-wrapping AddRec), we check if either is not loop invariant and not strided. In that case, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided but either is not strided by a constant, we cannot analyze them further currently, but may be able to disambiguate them with a runtime check. Reasoning about non-constant strides can be extended as follow-up. If both are strided by constants, we proceed as previously. This is an alternative to llvm#99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes llvm#87189.
@llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesNOTE: This PR contains 1debdc1, which Update getDependenceDistanceStrideAndSize to reason about different Instead of getting the constant strides for both source and sink of a Then proceed by checking the strides. If either source or sink are not strided (i.e. not a non-wrapping Otherwise they are either loop invariant or strided. In that case, we If both are strided but either is not strided by a constant, we cannot If both are strided by constants, we proceed as previously. This is an alternative to Fixes #87189. Patch is 22.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99577.diff 7 Files Affected:
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index afafb74bdcb0a..60d953ae18d75 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -199,9 +199,7 @@ class MemoryDepChecker {
/// Check whether the dependencies between the accesses are safe.
///
/// Only checks sets with elements in \p CheckDeps.
- bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
- const DenseMap<Value *, SmallVector<const Value *, 16>>
- &UnderlyingObjects);
+ bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps);
/// No memory dependence was encountered that would inhibit
/// vectorization.
@@ -351,11 +349,8 @@ class MemoryDepChecker {
/// element access it records this distance in \p MinDepDistBytes (if this
/// distance is smaller than any other distance encountered so far).
/// Otherwise, this function returns true signaling a possible dependence.
- Dependence::DepType
- isDependent(const MemAccessInfo &A, unsigned AIdx, const MemAccessInfo &B,
- unsigned BIdx,
- const DenseMap<Value *, SmallVector<const Value *, 16>>
- &UnderlyingObjects);
+ Dependence::DepType isDependent(const MemAccessInfo &A, unsigned AIdx,
+ const MemAccessInfo &B, unsigned BIdx);
/// Check whether the data dependence could prevent store-load
/// forwarding.
@@ -392,11 +387,9 @@ class MemoryDepChecker {
/// determined, or a struct containing (Distance, Stride, TypeSize, AIsWrite,
/// BIsWrite).
std::variant<Dependence::DepType, DepDistanceStrideAndSizeInfo>
- getDependenceDistanceStrideAndSize(
- const MemAccessInfo &A, Instruction *AInst, const MemAccessInfo &B,
- Instruction *BInst,
- const DenseMap<Value *, SmallVector<const Value *, 16>>
- &UnderlyingObjects);
+ getDependenceDistanceStrideAndSize(const MemAccessInfo &A, Instruction *AInst,
+ const MemAccessInfo &B,
+ Instruction *BInst);
};
class RuntimePointerChecking;
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 84214c47a10e1..cee5d353db5d2 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1458,12 +1458,11 @@ static bool isNoWrapAddRec(Value *Ptr, const SCEVAddRecExpr *AR,
return false;
}
-/// Check whether the access through \p Ptr has a constant stride.
-std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
- Type *AccessTy, Value *Ptr,
- const Loop *Lp,
- const DenseMap<Value *, const SCEV *> &StridesMap,
- bool Assume, bool ShouldCheckWrap) {
+static std::optional<const SCEV *>
+getPtrStrideSCEV(PredicatedScalarEvolution &PSE, Type *AccessTy, Value *Ptr,
+ const Loop *Lp,
+ const DenseMap<Value *, const SCEV *> &StridesMap, bool Assume,
+ bool ShouldCheckWrap) {
Type *Ty = Ptr->getType();
assert(Ty->isPointerTy() && "Unexpected non-ptr");
@@ -1520,13 +1519,14 @@ std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
if (Rem)
return std::nullopt;
+ const SCEV *StrideSCEV = PSE.getSE()->getConstant(C->getType(), Stride);
if (!ShouldCheckWrap)
- return Stride;
+ return StrideSCEV;
// The address calculation must not wrap. Otherwise, a dependence could be
// inverted.
if (isNoWrapAddRec(Ptr, AR, PSE, Lp))
- return Stride;
+ return StrideSCEV;
// An inbounds getelementptr that is a AddRec with a unit stride
// cannot wrap per definition. If it did, the result would be poison
@@ -1534,7 +1534,7 @@ std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
// when executed.
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
GEP && GEP->isInBounds() && (Stride == 1 || Stride == -1))
- return Stride;
+ return StrideSCEV;
// If the null pointer is undefined, then a access sequence which would
// otherwise access it can be assumed not to unsigned wrap. Note that this
@@ -1542,7 +1542,7 @@ std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
unsigned AddrSpace = Ty->getPointerAddressSpace();
if (!NullPointerIsDefined(Lp->getHeader()->getParent(), AddrSpace) &&
(Stride == 1 || Stride == -1))
- return Stride;
+ return StrideSCEV;
if (Assume) {
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
@@ -1550,7 +1550,7 @@ std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
<< "LAA: Pointer: " << *Ptr << "\n"
<< "LAA: SCEV: " << *AR << "\n"
<< "LAA: Added an overflow assumption\n");
- return Stride;
+ return StrideSCEV;
}
LLVM_DEBUG(
dbgs() << "LAA: Bad stride - Pointer may wrap in the address space "
@@ -1558,6 +1558,19 @@ std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
return std::nullopt;
}
+/// Check whether the access through \p Ptr has a constant stride.
+std::optional<int64_t>
+llvm::getPtrStride(PredicatedScalarEvolution &PSE, Type *AccessTy, Value *Ptr,
+ const Loop *Lp,
+ const DenseMap<Value *, const SCEV *> &StridesMap,
+ bool Assume, bool ShouldCheckWrap) {
+ std::optional<const SCEV *> StrideSCEV = getPtrStrideSCEV(
+ PSE, AccessTy, Ptr, Lp, StridesMap, Assume, ShouldCheckWrap);
+ if (StrideSCEV && isa<SCEVConstant>(*StrideSCEV))
+ return cast<SCEVConstant>(*StrideSCEV)->getAPInt().getSExtValue();
+ return std::nullopt;
+}
+
std::optional<int> llvm::getPointersDiff(Type *ElemTyA, Value *PtrA,
Type *ElemTyB, Value *PtrB,
const DataLayout &DL,
@@ -1899,23 +1912,11 @@ static bool areStridedAccessesIndependent(uint64_t Distance, uint64_t Stride,
return ScaledDist % Stride;
}
-/// Returns true if any of the underlying objects has a loop varying address,
-/// i.e. may change in \p L.
-static bool
-isLoopVariantIndirectAddress(ArrayRef<const Value *> UnderlyingObjects,
- ScalarEvolution &SE, const Loop *L) {
- return any_of(UnderlyingObjects, [&SE, L](const Value *UO) {
- return !SE.isLoopInvariant(SE.getSCEV(const_cast<Value *>(UO)), L);
- });
-}
-
std::variant<MemoryDepChecker::Dependence::DepType,
MemoryDepChecker::DepDistanceStrideAndSizeInfo>
MemoryDepChecker::getDependenceDistanceStrideAndSize(
const AccessAnalysis::MemAccessInfo &A, Instruction *AInst,
- const AccessAnalysis::MemAccessInfo &B, Instruction *BInst,
- const DenseMap<Value *, SmallVector<const Value *, 16>>
- &UnderlyingObjects) {
+ const AccessAnalysis::MemAccessInfo &B, Instruction *BInst) {
auto &DL = InnermostLoop->getHeader()->getDataLayout();
auto &SE = *PSE.getSE();
auto [APtr, AIsWrite] = A;
@@ -1933,12 +1934,10 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
BPtr->getType()->getPointerAddressSpace())
return MemoryDepChecker::Dependence::Unknown;
- int64_t StrideAPtr =
- getPtrStride(PSE, ATy, APtr, InnermostLoop, SymbolicStrides, true)
- .value_or(0);
- int64_t StrideBPtr =
- getPtrStride(PSE, BTy, BPtr, InnermostLoop, SymbolicStrides, true)
- .value_or(0);
+ std::optional<const SCEV *> StrideAPtr = getPtrStrideSCEV(
+ PSE, ATy, APtr, InnermostLoop, SymbolicStrides, true, true);
+ std::optional<const SCEV *> StrideBPtr = getPtrStrideSCEV(
+ PSE, BTy, BPtr, InnermostLoop, SymbolicStrides, true, true);
const SCEV *Src = PSE.getSCEV(APtr);
const SCEV *Sink = PSE.getSCEV(BPtr);
@@ -1946,26 +1945,19 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
// If the induction step is negative we have to invert source and sink of the
// dependence when measuring the distance between them. We should not swap
// AIsWrite with BIsWrite, as their uses expect them in program order.
- if (StrideAPtr < 0) {
+ if (StrideAPtr && SE.isKnownNegative(*StrideAPtr)) {
std::swap(Src, Sink);
std::swap(AInst, BInst);
+ std::swap(StrideAPtr, StrideBPtr);
}
const SCEV *Dist = SE.getMinusSCEV(Sink, Src);
LLVM_DEBUG(dbgs() << "LAA: Src Scev: " << *Src << "Sink Scev: " << *Sink
- << "(Induction step: " << StrideAPtr << ")\n");
+ << "\n");
LLVM_DEBUG(dbgs() << "LAA: Distance for " << *AInst << " to " << *BInst
<< ": " << *Dist << "\n");
- // Needs accesses where the addresses of the accessed underlying objects do
- // not change within the loop.
- if (isLoopVariantIndirectAddress(UnderlyingObjects.find(APtr)->second, SE,
- InnermostLoop) ||
- isLoopVariantIndirectAddress(UnderlyingObjects.find(BPtr)->second, SE,
- InnermostLoop))
- return MemoryDepChecker::Dependence::IndirectUnsafe;
-
// Check if we can prove that Sink only accesses memory after Src's end or
// vice versa. At the moment this is limited to cases where either source or
// sink are loop invariant to avoid compile-time increases. This is not
@@ -1987,11 +1979,47 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
}
}
- // Need accesses with constant strides and the same direction. We don't want
- // to vectorize "A[B[i]] += ..." and similar code or pointer arithmetic that
- // could wrap in the address space.
- if (!StrideAPtr || !StrideBPtr || (StrideAPtr > 0 && StrideBPtr < 0) ||
- (StrideAPtr < 0 && StrideBPtr > 0)) {
+ // Need accesses with constant strides and the same direction for further
+ // dependence analysis. We don't want to vectorize "A[B[i]] += ..." and
+ // similar code or pointer arithmetic that could wrap in the address space.
+ //
+ // If Src or Sink are non-wrapping AddRecs, StrideAPtr and StrideBPtr contain
+ // a SCEV representing the stride of the SCEV. It may not be a known constant
+ // value though.
+
+ // If either Src or Sink are not strided (i.e. not a non-wrapping AddRec), we
+ // cannot analyze the dependence further.
+ if (!StrideAPtr || !StrideBPtr) {
+ bool SrcInvariant = SE.isLoopInvariant(Src, InnermostLoop);
+ bool SinkInvariant = SE.isLoopInvariant(Sink, InnermostLoop);
+ // If either Src or Sink are not loop invariant and not strided, the
+ // expression in the current iteration may overlap with any earlier or later
+ // iteration. This is not safe and we also cannot generate runtime checks to
+ // ensure safety. This includes expressions where an index is loaded in each
+ // iteration or wrapping AddRecs.
+ if ((!SrcInvariant && !StrideAPtr) || (!SinkInvariant && !StrideBPtr))
+ return MemoryDepChecker::Dependence::IndirectUnsafe;
+
+ // Otherwise both Src or Sink are either loop invariant or strided and we
+ // can generate a runtime check to disambiguate the accesses.
+ return MemoryDepChecker::Dependence::Unknown;
+ }
+
+ LLVM_DEBUG(dbgs() << "LAA: Src induction step: " << **StrideAPtr
+ << " Sink induction step: " << **StrideBPtr << "\n");
+ // If either Src or Sink have a non-constant stride, we can generate a runtime
+ // check to disambiguate them.
+ if ((!isa<SCEVConstant>(*StrideAPtr)) || (!isa<SCEVConstant>(*StrideBPtr)))
+ return MemoryDepChecker::Dependence::Unknown;
+
+ // Both Src and Sink have a constant stride, check if they are in the same
+ // direction.
+ int64_t StrideAPtrInt =
+ cast<SCEVConstant>(*StrideAPtr)->getAPInt().getSExtValue();
+ int64_t StrideBPtrInt =
+ cast<SCEVConstant>(*StrideBPtr)->getAPInt().getSExtValue();
+ if ((StrideAPtrInt > 0 && StrideBPtrInt < 0) ||
+ (StrideAPtrInt < 0 && StrideBPtrInt > 0)) {
LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n");
return MemoryDepChecker::Dependence::Unknown;
}
@@ -2001,22 +2029,20 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
if (!HasSameSize)
TypeByteSize = 0;
- return DepDistanceStrideAndSizeInfo(Dist, std::abs(StrideAPtr),
- std::abs(StrideBPtr), TypeByteSize,
+ return DepDistanceStrideAndSizeInfo(Dist, std::abs(StrideAPtrInt),
+ std::abs(StrideBPtrInt), TypeByteSize,
AIsWrite, BIsWrite);
}
-MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
- const MemAccessInfo &A, unsigned AIdx, const MemAccessInfo &B,
- unsigned BIdx,
- const DenseMap<Value *, SmallVector<const Value *, 16>>
- &UnderlyingObjects) {
+MemoryDepChecker::Dependence::DepType
+MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
+ const MemAccessInfo &B, unsigned BIdx) {
assert(AIdx < BIdx && "Must pass arguments in program order");
// Get the dependence distance, stride, type size and what access writes for
// the dependence between A and B.
- auto Res = getDependenceDistanceStrideAndSize(
- A, InstMap[AIdx], B, InstMap[BIdx], UnderlyingObjects);
+ auto Res =
+ getDependenceDistanceStrideAndSize(A, InstMap[AIdx], B, InstMap[BIdx]);
if (std::holds_alternative<Dependence::DepType>(Res))
return std::get<Dependence::DepType>(Res);
@@ -2250,10 +2276,8 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
return Dependence::BackwardVectorizable;
}
-bool MemoryDepChecker::areDepsSafe(
- DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
- const DenseMap<Value *, SmallVector<const Value *, 16>>
- &UnderlyingObjects) {
+bool MemoryDepChecker::areDepsSafe(DepCandidates &AccessSets,
+ MemAccessInfoList &CheckDeps) {
MinDepDistBytes = -1;
SmallPtrSet<MemAccessInfo, 8> Visited;
@@ -2296,8 +2320,8 @@ bool MemoryDepChecker::areDepsSafe(
if (*I1 > *I2)
std::swap(A, B);
- Dependence::DepType Type = isDependent(*A.first, A.second, *B.first,
- B.second, UnderlyingObjects);
+ Dependence::DepType Type =
+ isDependent(*A.first, A.second, *B.first, B.second);
mergeInStatus(Dependence::isSafeForVectorization(Type));
// Gather dependences unless we accumulated MaxDependences
@@ -2652,8 +2676,7 @@ bool LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
if (Accesses.isDependencyCheckNeeded()) {
LLVM_DEBUG(dbgs() << "LAA: Checking memory dependencies\n");
DepsAreSafe = DepChecker->areDepsSafe(DependentAccesses,
- Accesses.getDependenciesToCheck(),
- Accesses.getUnderlyingObjects());
+ Accesses.getDependenciesToCheck());
if (!DepsAreSafe && DepChecker->shouldRetryWithRuntimeCheck()) {
LLVM_DEBUG(dbgs() << "LAA: Retrying with memory checks\n");
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/load-store-index-loaded-in-loop.ll b/llvm/test/Analysis/LoopAccessAnalysis/load-store-index-loaded-in-loop.ll
index 2e61a28039846..6d8e296ec72fa 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/load-store-index-loaded-in-loop.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/load-store-index-loaded-in-loop.ll
@@ -9,21 +9,19 @@
define void @B_indices_loaded_in_loop_A_stored(ptr %A, ptr noalias %B, i64 %N, i64 %off) {
; CHECK-LABEL: 'B_indices_loaded_in_loop_A_stored'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Memory dependences are safe with run-time checks
+; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT: Unsafe indirect dependence.
; CHECK-NEXT: Dependences:
+; CHECK-NEXT: IndirectUnsafe:
+; CHECK-NEXT: %l = load i32, ptr %gep.B, align 4 ->
+; CHECK-NEXT: store i32 %inc, ptr %gep.B, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT: Unknown:
+; CHECK-NEXT: %indices = load i8, ptr %gep.A, align 1 ->
+; CHECK-NEXT: store i32 %l, ptr %gep.C, align 4
+; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
-; CHECK-NEXT: Check 0:
-; CHECK-NEXT: Comparing group ([[GRP1:0x[0-9a-f]+]]):
-; CHECK-NEXT: %gep.C = getelementptr inbounds i32, ptr %A, i64 %iv
-; CHECK-NEXT: Against group ([[GRP2:0x[0-9a-f]+]]):
-; CHECK-NEXT: %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv.off
; CHECK-NEXT: Grouped accesses:
-; CHECK-NEXT: Group [[GRP1]]:
-; CHECK-NEXT: (Low: %A High: ((4 * %N) + %A))
-; CHECK-NEXT: Member: {%A,+,4}<nuw><%loop>
-; CHECK-NEXT: Group [[GRP2]]:
-; CHECK-NEXT: (Low: (%off + %A) High: (%N + %off + %A))
-; CHECK-NEXT: Member: {(%off + %A),+,1}<nw><%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -59,9 +57,9 @@ define void @B_indices_loaded_in_loop_A_not_stored(ptr %A, ptr noalias %B, i64 %
; CHECK-LABEL: 'B_indices_loaded_in_loop_A_not_stored'
; CHECK-NEXT: loop:
; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Unsafe indirect dependence.
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
+; CHECK-NEXT: IndirectUnsafe:
; CHECK-NEXT: %l = load i32, ptr %gep.B, align 4 ->
; CHECK-NEXT: store i32 %inc, ptr %gep.B, align 4
; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/pointer-with-unknown-bounds.ll b/llvm/test/Analysis/LoopAccessAnalysis/pointer-with-unknown-bounds.ll
index 546a75cf4efd5..28ee6c6f0a89a 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/pointer-with-unknown-bounds.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/pointer-with-unknown-bounds.ll
@@ -13,9 +13,9 @@ target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
; CHECK-NEXT: for.body:
; CHECK-NEXT: Report: unsafe dependent memory operations in loop
; CHECK-NOT: Report: cannot identify array bounds
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Unsafe indirect dependence.
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
+; CHECK-NEXT: IndirectUnsafe:
; CHECK-NEXT: %loadA = load i16, ptr %arrayidxA, align 2 ->
; CHECK-NEXT: store i16 %mul, ptr %arrayidxA, align 2
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/print-order.ll b/llvm/test/Analysis/LoopAccessAnalysis/print-order.ll
index 18e45f469b4a3..8ca30383092c6 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/print-order.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/print-order.ll
@@ -9,8 +9,9 @@
; CHECK-LABEL: 'negative_step'
; CHECK: LAA: Found an analyzable loop: loop
; CHECK: LAA: Checking memory dependencies
-; CHECK-NEXT: LAA: Src Scev: {(4092 + %A),+,-4}<nw><%loop>Sink Scev: {(4088 + %A)<nuw>,+,-4}<nw><%loop>(Induction step: -1)
+; CHECK-NEXT: LAA: Src Scev: {(4092 + %A),+,-4}<nw><%loop>Sink Scev: {(4088 + %A)<nuw>,+,-4}<nw><%loop>
; CHECK-NEXT: LAA: Distance for store i32 %add, ptr %gep.A.plus.1, align 4 to %l = load i32, ptr %gep.A, align 4: -4
+; CHECK-NEXT: LAA: Src induction step: -1 Sink induction step: -1
; CHECK-NEXT: LAA: Dependence is negative
define void @negative_step(ptr nocapture %A) {
@@ -41,8 +42,9 @@ exit:
; CHECK-LABEL: 'positive_step'
; CHECK: LAA: Found an analyzable loop: loop
; CHECK: LAA: Checking memory dependencies
-; CHECK-NEXT: LAA: Src Scev: {(4 + %A)<nuw>,+,4}<nuw><%loop>Sink Scev: {%A,+,4}<nw><%loop>(Induction step: 1)
+; CHECK-NEXT: LAA: Src Scev: {(4 + %A)<nuw>,+,4}<nuw><%loop...
[truncated]
|
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please undo the merge of 1debdc1 into this PR if it doesn't have to do anything with solving the problem? At least consider a stacked PR. I would appreciate making reviewing as easy as possible.
<< " Sink induction step: " << **StrideBPtr << "\n"); | ||
// If either Src or Sink have a non-constant stride, we can generate a runtime | ||
// check to disambiguate them. | ||
if ((!isa<SCEVConstant>(*StrideAPtr)) || (!isa<SCEVConstant>(*StrideBPtr))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((!isa<SCEVConstant>(*StrideAPtr)) || (!isa<SCEVConstant>(*StrideBPtr))) | |
if (!isa<SCEVConstant>(*StrideAPtr) || !isa<SCEVConstant>(*StrideBPtr)) |
!
has a higher precedence than ||
so we can save some parens.
However, getPtrStrideSCEV
only ever returns SCEVConstant
(or nullopt
), so I don't get why making this patch more complicated by including 1debdc1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be simplified in the latest version
<< " Sink induction step: " << **StrideBPtr << "\n"); | ||
// If either Src or Sink have a non-constant stride, we can generate a runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<< " Sink induction step: " << **StrideBPtr << "\n"); | |
// If either Src or Sink have a non-constant stride, we can generate a runtime | |
<< " Sink induction step: " << **StrideBPtr << "\n"); | |
// If either Src or Sink have a non-constant stride, we can generate a runtime |
[nit] space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed in the latest version ,thanks!
int64_t StrideBPtrInt = | ||
cast<SCEVConstant>(*StrideBPtr)->getAPInt().getSExtValue(); | ||
if ((StrideAPtrInt > 0 && StrideBPtrInt < 0) || | ||
(StrideAPtrInt < 0 && StrideBPtrInt > 0)) { | ||
LLVM_DEBUG(dbgs() << "Pointer access with non-constant stride\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the check for opposite stride directions remain here. Please update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the latest version, thanks!
if ((!SrcInvariant && !StrideAPtr) || (!SinkInvariant && !StrideBPtr)) | ||
return MemoryDepChecker::Dependence::IndirectUnsafe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is the relevant change. An access must be strided or invariant for analysis. That's a good simplification over whatever isLoopVariantIndirectAddress(UnderlyingObjects...)
does.
I would inverse the condition, to check the supported cases and fallback to the unsupported one (we may want to support more cases in the future):
// Can generate runtime check for strided and invariant (i.e. stride 0) accesses.
if ((SrcInvariant || StrideAPtr) && (SinkInvariant || StrideBPtr))
MemoryDepChecker::Dependence::Unknown;
// Some case we do not support.
return MemoryDepChecker::Dependence::IndirectUnsafe;
(NB: Pretty weird to call the more constrained case "Unknown")
Since invariant means stride 0, why not make getPtrStride
return 0 in that case? Seems like confusing 0
and "cannot determine stride" is the root cause of this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally was worried about other users of getPtrStride which may not handled be all callers properly. But it looks like all all other callers use .value_or(0)
so we should be fine to update it as suggested.
Did this in the latest version, should be much simpler now, thanks!
(NB: Pretty weird to call the more constrained case "Unknown")
Agreed, naming could be improved separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, naming could be improved separately
Also, IndirectUnsafe
may have been intended to only apply for the A[B[i]]
case (access ptr depends on index loaded from memory whose address itself is not invariant). #87189 may have demonstrated that trying to identify the cases that don't work is bound to fail (in this case forgetting other SCEVUnknown can be source of unpredictable access as well, for instance the return value of a function call), but instead one should identify the cases that are known to be supported (in this case: constant strides). That means IndirectUnsafe
should be renamed as well, since it doesn't just apply to A[B[i]]
anymore.
You probably intend to extend getPtrStride
to return a SCEV so one can support non-constant strides, but that are invariant to the loop. It might be useful to pass APtr/BPtr as a SCEV only (Src
/Sink
) consistently as well instead calling PSE.getSCEV
on-demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look into renaming separately, thanks!
You probably intend to extend getPtrStride to return a SCEV so one can support non-constant strides, but that are invariant to the loop. It might be useful to pass APtr/BPtr as a SCEV only (Src/Sink) consistently as well instead calling PSE.getSCEV on-demand.
Yep, will also work on that as follow-up, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. In a rare case, this makes the code base even simpler than before.
✅ With the latest revision this PR passed the C/C++ code formatter. |
…m#99577) Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Update getPtrStride to return 0 for invariant pointers. Then proceed by checking the strides. If either source or sink are not strided by a constant (i.e. not a non-wrapping AddRec) or invariant, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided by constants, we proceed as previously. This is an alternative to llvm#99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes llvm#87189. PR: llvm#99577
…m#99577) Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Update getPtrStride to return 0 for invariant pointers. Then proceed by checking the strides. If either source or sink are not strided by a constant (i.e. not a non-wrapping AddRec) or invariant, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided by constants, we proceed as previously. This is an alternative to llvm#99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes llvm#87189. PR: llvm#99577
* [Metadata] Try to merge the first and last ranges. (#101860) Fixes #101859. If we have at least 2 ranges, we have to try to merge the last and first ones to handle the wrap range. (cherry picked from commit 4377656f2419a8eb18c01e86929b689dcf22b5d6) * InferAddressSpaces: Fix mishandling stores of pointers to themselves (#101877) (cherry picked from commit 3c483b887e5a32a0ddc0a52a467b31f74aad25bb) * [ARM] [Windows] Use IMAGE_SYM_CLASS_STATIC for private functions (#101828) For functions with private linkage, pick IMAGE_SYM_CLASS_STATIC rather than IMAGE_SYM_CLASS_EXTERNAL; GlobalValue::isInternalLinkage() only checks for InternalLinkage, while GlobalValue::isLocalLinkage() checks for both InternalLinkage and PrivateLinkage. This matches what the AArch64 target does, since commit 3406934e4db4bf95c230db072608ed062c13ad5b. This activates a preexisting fix for the AArch64 target from 1e7f592a890aad860605cf5220530b3744e107ba, for the ARM target as well. When a relocation points at a symbol, one usually can convey an offset to the symbol by encoding it as an immediate in the instruction. However, for the ARM and AArch64 branch instructions, the immediate stored in the instruction is ignored by MS link.exe (and lld-link matches this aspect). (It would be simple to extend lld-link to support it - but such object files would be incompatible with MS link.exe.) This was worked around by 1e7f592a890aad860605cf5220530b3744e107ba by emitting symbols into the object file symbol table, for temporary symbols that otherwise would have been omitted, if they have the class IMAGE_SYM_CLASS_STATIC, in order to avoid needing an offset in the relocated instruction. This change gives the symbols generated from functions with the IR level "private" linkage the right class, to activate that workaround. This fixes https://github.com/llvm/llvm-project/issues/100101, fixing code generation for coroutines for Windows on ARM. After the change in f78688134026686288a8d310b493d9327753a022, coroutines generate a function with private linkage, and calls to this function were previously broken for this target. (cherry picked from commit 8dd065d5bc81b0c8ab57f365bb169a5d92928f25) * Forward declare OSSpinLockLock on MacOS since it's not shipped on the system. (#101392) Fixes build errors on some SDKs. rdar://132607572 (cherry picked from commit 3a4c7cc56c07b2db9010c2228fc7cb2a43dd9b2d) * ReleaseNotes: lld/ELF: mention CREL * Bump version to 19.1.0-rc2 * [sanitizer_common][test] Fix SanitizerIoctl/KVM_GET_* tests on Linux/… (#100532) …sparc64 Two ioctl tests `FAIL` on Linux/sparc64 (both 32 and 64-bit): ``` SanitizerCommon-Unit :: ./Sanitizer-sparc-Test/SanitizerIoctl/KVM_GET_LAPIC SanitizerCommon-Unit :: ./Sanitizer-sparc-Test/SanitizerIoctl/KVM_GET_MP_STATE ``` like ``` compiler-rt/lib/sanitizer_common/tests/./Sanitizer-sparc-Test --gtest_filter=SanitizerIoctl.KVM_GET_LAPIC -- compiler-rt/lib/sanitizer_common/tests/sanitizer_ioctl_test.cpp:91: Failure Value of: res Actual: false Expected: true compiler-rt/lib/sanitizer_common/tests/sanitizer_ioctl_test.cpp:92: Failure Expected equality of these values: ioctl_desc::WRITE Which is: 2 desc.type Which is: 1 ``` The problem is that Linux/sparc64, like Linux/mips, uses a different layout for the `ioctl` `request` arg than most other Linux targets as can be seen in `sanitizer_platform_limits_posix.h` (`IOC_*`). Therefore, this patch makes the tests use the correct one. Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`. (cherry picked from commit 9eefe065bb2752b0db9ed553d2406e9a15ce349e) * [sanitizer_common] Don't use syscall(SYS_clone) on Linux/sparc64 (#100534) ``` SanitizerCommon-Unit :: ./Sanitizer-sparc-Test/SanitizerCommon/StartSubprocessTest ``` and every single test using the `llvm-symbolizer` `FAIL` on Linux/sparc64 in a very weird way: when using `StartSubprocess`, there's a call to `internal_fork`, but we never reach `internal_execve`. `internal_fork` is implemented using `syscall(SYS_clone)`. The calling convention of that syscall already varies considerably between targets, but as documented in `clone(2)`, SPARC again is widely different. Instead of trying to match `glibc` here, this patch just calls `__fork`. Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`. (cherry picked from commit 1c53b907bd6348138a59da270836fc9b4c161a07) * [sanitizer_common] Adjust signal_send.cpp for Linux/sparc64 (#100538) ``` SanitizerCommon-ubsan-sparc-Linux :: Linux/signal_send.cpp ``` currently `FAIL`s on Linux/sparc64 (32 and 64-bit). Instead of the expected values for `SIGUSR1` (`10`) and `SIGUSR1` (`12`), that target uses `30` and `31`. On Linux/x86_64, the signals get their values from `x86_64-linux-gnu/bits/signum-generic.h`, to be overridden in `x86_64-linux-gnu/bits/signum.h`. On Linux/sparc64 OTOH, the definitions are from `sparc64-linux-gnu/bits/signum-arch.h` and remain that way. There's no `signum.h` at all. The patch allows for both values. Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`. (cherry picked from commit 7cecbdfe4eac3fd7268532426fb6b13e51b8720d) * [sanitizer_common] Fix internal_*stat on Linux/sparc64 (#101012) ``` SanitizerCommon-Unit :: ./Sanitizer-sparcv9-Test/SanitizerCommon/FileOps ``` `FAIL`s on 64-bit Linux/sparc64: ``` projects/compiler-rt/lib/sanitizer_common/tests/./Sanitizer-sparcv9-Test --gtest_filter=SanitizerCommon.FileOps -- compiler-rt/lib/sanitizer_common/tests/sanitizer_libc_test.cpp:144: Failure Expected equality of these values: len1 + len2 Which is: 10 fsize Which is: 1721875535 ``` The issue is similar to the mips64 case: the Linux/sparc64 `*stat` syscalls take a `struct kernel_stat64 *` arg. Also the syscalls actually used differ. This patch handles this, adopting the mips64 code to avoid too much duplication. Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`. (cherry picked from commit fcd6bd5587cc376cd8f43b60d1c7d61fdfe0f535) * [ADT] Add `<cstdint>` to SmallVector (#101761) SmallVector uses `uint32_t`, `uint64_t` without including `<cstdint>` which fails to build w/ GCC 15 after a change in libstdc++ [0] [0] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3a817a4a5a6d94da9127af3be9f84a74e3076ee2 (cherry picked from commit 7e44305041d96b064c197216b931ae3917a34ac1) * [libc++][bit] Improves rotate functions. (#98032) Investigating #96612 shows our implementation was different from the Standard and could cause UB. Testing the codegen showed quite a bit of assembly generated for these functions. The functions have been written differently which allows Clang to optimize the code to use simple CPU rotate instructions. Fixes: https://github.com/llvm/llvm-project/issues/96612 * [AArch64] Avoid inlining if ZT0 needs preserving. (#101343) Inlining may result in different behaviour when the callee clobbers ZT0, because normally the call-site will have code to preserve ZT0. When inlining the function this code to preserve ZT0 will no longer be emitted, and so the resulting behaviour of the program is changed. (cherry picked from commit fb470db7b3a8ce6853e8bf17d235617a2fa79434) * [AArch64] Avoid NEON dot product in streaming[-compatible] functions (#101677) The NEON dot product is not valid in streaming mode. A follow-up patch will improve codegen for these operations. (cherry picked from commit 12937b1bfb23cca4731fa274f3358f7286cc6784) * [AArch64][SME] Rewrite __arm_sc_memset to remove invalid instruction (#101522) The implementation of __arm_sc_memset in compiler-rt contains a Neon dup instruction which is not valid in streaming mode. This patch rewrites the function, using an SVE mov instruction if available. (cherry picked from commit d6649f2d4871c4535ae0519920e36100748890c4) * [LLVM][TTI][SME] Allow optional auto-vectorisation for streaming functions. (#101679) The command line option enable-scalable-autovec-in-streaming-mode is used to enable scalable vectors but the same check is missing from enableScalableVectorization, which is blocking auto-vectorisation. (cherry picked from commit 7775a4882d7105fde7f7a81f3c72567d39afce45) * [Driver] Restrict Ofast deprecation help message to Clang (#101682) The discussion about this in Flang (https://discourse.llvm.org/t/rfc-deprecate-ofast-in-flang/80243) has not concluded hence restricting the deprecation only to Clang. (cherry picked from commit e60ee1f2d70bdb0ac87b09ae685d669d8543b7bd) * [Clang] SFINAE on mismatching pack length during constraint satisfaction checking (#101879) If a fold expanded constraint would expand packs of different size, it is not a valid pack expansion and it is not satisfied. This should not produce an error. Fixes #99430 (cherry picked from commit da380b26e4748ade5a8dba85b7df5e1c4eded8bc) * [Driver] Temporarily probe aarch64-linux-gnu GCC installation As the comment explains, `*Triples[]` lists are discouraged and not comprehensive anyway (e.g. aarch64-unknown-linux-gnu/aarch64-unknown-linux-musl/aarch64-amazon-linux do not work). Boost incorrectly specifies --target=arm64-pc-linux ("arm64" should not be used for Linux) and expects to probe "aarch64-linux-gnu". Add this temporary workaround for the 19.x releases. * workflows/release-tasks: Add missing permissions for release binaries (#102023) Now that the release binaries create artifact attestations, we need to ensure that we call the workflow with the correct permissions. (cherry picked from commit dc349a3f47882cdac7112c763d2964b59e77356a) * workflows/release-binaries: Give attestation artifacts a unique name (#102041) We need a different attestation for each supported architecture, so there artifacts all need to have a different name. The upload step is run on a Linux runner, so no matter which architecture's binary is being uploaded the runner.os and runner.arch variables would always be 'Linux' and 'X64' and so we can't use them for naming the artifact. (cherry picked from commit 3c8dadda3aa20b89fb5ad29ae31380d9594c3430) * [BinaryFormat] Disable MachOTest.UnalignedLC on SPARC (#100086) As discussed in Issue #86793, the `MachOTest.UnalignedLC` test dies with `SIGBUS` on SPARC, a strict-alignment target. It simply cannot work there. Besides, the test invokes undefined behaviour on big-endian targets, so this patch disables it on all of those. Tested on `sparcv9-sun-solaris2.11` and `amd64-pc-solaris2.11`. (cherry picked from commit 3a226dbe27ac7c7d935bc0968e84e31798a01207) * [LLDB] Add `<cstdint>` to AddressableBits (#102110) (cherry picked from commit bb59f04e7e75dcbe39f1bf952304a157f0035314) * [LAA] Refine stride checks for SCEVs during dependence analysis. (#99577) Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Update getPtrStride to return 0 for invariant pointers. Then proceed by checking the strides. If either source or sink are not strided by a constant (i.e. not a non-wrapping AddRec) or invariant, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided by constants, we proceed as previously. This is an alternative to https://github.com/llvm/llvm-project/pull/99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes https://github.com/llvm/llvm-project/issues/87189. PR: https://github.com/llvm/llvm-project/pull/99577 * [CalcSpillWeights] Avoid x87 excess precision influencing weight result Fixes #99396 The result of `VirtRegAuxInfo::weightCalcHelper` can be influenced by x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386. This leads to different object file output when cross-compiling to i386, or native. Similar to 7af3432e22b0, we need to add a `volatile` qualifier to the local `Weight` variable to force it onto the stack, and avoid the excess precision. Define `stack_float_t` in `MathExtras.h` for this purpose, and use it. (cherry picked from commit c80c09f3e380a0a2b00b36bebf72f43271a564c1) * [BOLT] Support map other function entry address (#101466) Allow BOLT to map the old address to a new binary address if the old address is the entry of the function. (cherry picked from commit 734c0488b6e69300adaf568f880f40b113ae02ca) * [lld][ARM] Fix assertion when mixing ARM and Thumb objects (#101985) Previously, we selected the Thumb2 PLT sequences if any input object is marked as not supporting the ARM ISA, which then causes assertion failures when calls from ARM code in other objects are seen. I think the intention here was to only use Thumb PLTs when the target does not have the ARM ISA available, signalled by no objects being marked as having it available. To do that we need to track which ISAs we have seen as we parse the build attributes, and defer the decision about PLTs until all input objects have been parsed. This bug was triggered by real code in picolibc, which have some versions of string.h functions built with Thumb2-only build attributes, so that they are compatible with v7-A, v7-R and v7-M. Fixes #99008. (cherry picked from commit a1c6467bd90905d52cf8f6162b60907f8e98a704) * [BOLT] Skip PLT search for zero-value weak reference symbols (#69136) Take a common weak reference pattern for example ``` __attribute__((weak)) void undef_weak_fun(); if (&undef_weak_fun) undef_weak_fun(); ``` In this case, an undefined weak symbol `undef_weak_fun` has an address of zero, and Bolt incorrectly changes the relocation for the corresponding symbol to symbol@PLT, leading to incorrect runtime behavior. (cherry picked from commit 6c8933e1a095028d648a5a26aecee0f569304dd0) * [AArch64] Don't replace dst of SWP instructions with (X|W)ZR (#102139) This change updates the AArch64DeadRegisterDefinition pass to ensure it does not replace the destination register of a SWP instruction with the zero register when its value is unused. This is necessary to ensure that the ordering of such instructions in relation to DMB.LD barries adheres to the definitions of the AArch64 Memory Model. The memory model states the following (ARMARM version DDI 0487K.a §B2.3.7): ``` Barrier-ordered-before An effect E1 is Barrier-ordered-before an effect E2 if one of the following applies: [...] * All of the following apply: - E1 is a Memory Read effect. - E1 is generated by an instruction whose destination register is not WZR or XZR. - E1 appears in program order before E3. - E3 is either a DMB LD effect or a DSB LD effect. - E3 appears in program order before E2. ``` Prior to this change, by replacing the destination register of such SWP instruction with WZR/XZR, the ordering relation described above was incorrectly removed from the generated code. The new behaviour is ensured in this patch by adding the relevant `SWP[L](B|H|W|X)` instructions to list in the `atomicReadDroppedOnZero` predicate, which already covered the `LD<Op>` instructions that are subject to the same effect. Fixes #68428. (cherry picked from commit beb37e2e22b549b361be7269a52a3715649e956a) * [clang][modules] Enable built-in modules for the upcoming Apple releases (#102239) The upcoming Apple SDK releases will support the clang built-in headers being in the clang built-in modules: stop passing -fbuiltin-headers-in-system-modules for those SDK versions. (cherry picked from commit 961639962251de7428c3fe93fa17cfa6ab3c561a) * [Driver] Fix a warning This patch fixes: clang/lib/Driver/ToolChains/Darwin.cpp:2937:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] (cherry picked from commit 0f1361baf650641a59aaa1710d7a0b7b02f2e56d) * [AIX]export function descriptor symbols related to template functions. (#101920) This fixes regressions caused by https://github.com/llvm/llvm-project/pull/97526 After that patch, all undefined references to DS symbol are removed. This makes DS symbols(for template functions) have no reference in some cases. So extract_symbols.py does not export these DS symbols for these cases. On AIX, exporting the function descriptor depends on references to the function descriptor itself and the function entry symbol. Without this fix, on AIX, we get: ``` rtld: 0712-001 Symbol _ZN4llvm15SmallVectorBaseIjE13mallocForGrowEPvmmRm was referenced from module llvm-project/build/unittests/Passes/Plugins/TestPlugin.so(), but a runtime definition of the symbol was not found. ``` (cherry picked from commit 396343f17b1182ff8ed698beac3f9b93b1d9dabd) * [clang-format] Fix a bug in annotating CastRParen (#102261) Fixes #102102. (cherry picked from commit 8c7a038f9029c675f2a52ff5e85f7b6005ec7b3e) * [clang] Fix crash when #embed used in a compound literal (#102304) Fixes https://github.com/llvm/llvm-project/issues/102248 (cherry picked from commit 3606d69d0b57dc1d23a4362e376e7ad27f650c27) * [AMDGPU] Fix folding clamp into pseudo scalar instructions (#100568) Clamp is canonically a v_max* instruction with a VGPR dst. Folding clamp into a pseudo scalar instruction can cause issues due to a change in regbank. We fix this with a copy. (cherry picked from commit 817cd726454f01e990cd84e5e1d339b120b5ebaa) * Revert "[LLVM] Silence compiler-rt warning in runtimes build (#99525)" This patch broke LLVM Flang build on Windows. PR #100202 This reverts commit f6f88f4b99638821af803d1911ab6a7dac04880b. (cherry picked from commit 73d862e478738675f5d919c6a196429acd7b5f50) * [TBAA] Do not rewrite TBAA if exists, always null out `!tbaa.struct` Retrieve `!tbaa` metadata via `!tbaa.struct` in `adjustForAccess` unless it already exists, as struct-path aware `MDNodes` emitted via `new-struct-path-tbaa` may be leveraged. As `!tbaa.struct` carries memcpy padding semantics among struct fields and `!tbaa` is already meant to aid to alias semantics, it should be possible to zero out `!tbaa.struct` once the memcpy has been simplified. `SROA/tbaa-struct.ll` test has gone out of scope, as `!tbaa` has already replaced `!tbaa.struct` in SROA. Fixes: https://github.com/llvm/llvm-project/issues/95661. * [NFC][llvm][support] rename INFINITY in regcomp (#101758) since C23 this macro is defined by float.h, which clang implements in it's float.h since #96659 landed. However, regcomp.c in LLVMSupport happened to define it's own macro with that name, leading to problems when bootstrapping. This change renames the offending macro. (cherry picked from commit 899f648866affd011baae627752ba15baabc2ef9) * [ELF] .llvm.call-graph-profile: support CREL https://reviews.llvm.org/D105217 added RELA support. This patch adds CREL support. (cherry picked from commit 0766a59be3256e83a454a089f01215d6c7f94a48) * [ELF] scanRelocations: support .crel.eh_frame Follow-up to #98115. For EhInputSection, RelocationScanner::scan calls sortRels, which doesn't support the CREL iterator. We should set supportsCrel to false to ensure that the initial_location fields in .eh_frame FDEs are relocated. (cherry picked from commit a821fee312d15941174827a70cb534c2f2fe1177) * Revert "demangle function names in trace files (#87626)" This reverts commit 0fa20c55b58deb94090985a5c5ffda4d5ceb3cd1. Storing raw symbol names is generally preferred in profile files. Demangling might lose information. Language frontends might use demangling schemes not supported by LLVMDemangle (https://github.com/llvm/llvm-project/issues/45901#issuecomment-2008686663). In addition, calling `demangle` for each function has a significant performance overhead (#102222). I believe that even if we decide to provide a producer-side demangling, it would not be on by default. Pull Request: https://github.com/llvm/llvm-project/pull/102274 (cherry picked from commit 72b73e23b6c36537db730ebea00f92798108a6e5) * [AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (#102105) The code-generator is currently not able to handle scalable vectors of <vscale x 1 x eltty>. The usual "fix" for this until it is supported is to mark the costs of loads/stores with an invalid cost, preventing the vectorizer from vectorizing at those factors. But on rare occasions loops do not contain load/stores, only reductions. So whilst this is still unsupported return an invalid cost to avoid selecting vscale x 1 VFs. The cost of a reduction is not currently used by the vectorizer so this adds the cost to the add/mul/and/or/xor or min/max that should feed the reduction. It includes reduction costs too, for completeness. This change will be removed when code-generation for these types is sufficiently reliable. Fixes #99760 (cherry picked from commit 0b745a10843fc85e579bbf459f78b3f43e7ab309) * [clang] Wire -fptrauth-returns to "ptrauth-returns" fn attribute. (#102416) We already ended up with -fptrauth-returns, the feature macro, the lang opt, and the actual backend lowering. The only part left is threading it all through PointerAuthOptions, to drive the addition of the "ptrauth-returns" attribute to generated functions. While there, do minor cleanup on ptrauth-function-attributes.c. This also adds ptrauth_key_return_address to ptrauth.h. (cherry picked from commit 2eb6e30fe83ccce3cf01e596e73fa6385facd44b) * [lldb] Move definition of SBSaveCoreOptions dtor out of header (#102539) This class is technically not usable in its current state. When you use it in a simple C++ project, your compiler will complain about an incomplete definition of SaveCoreOptions. Normally this isn't a problem, other classes in the SBAPI do this. The difference is that SBSaveCoreOptions has a default destructor in the header, so the compiler will attempt to generate the code for the destructor with an incomplete definition of the impl type. All methods for every class, including constructors and destructors, must have a separate implementation not in a header. (cherry picked from commit 101cf540e698529d3dd899d00111bcb654a3c12b) * [Clang] Define __cpp_pack_indexing (#101956) Following the discussion on #101448 this defines `__cpp_pack_indexing`. Since pack indexing is currently supported in all language modes, the feature test macro is also defined in all language modes. (cherry picked from commit c65afad9c58474a784633314e945c874ed06584a) * workflows/release-binaries-all: Pass secrets on to release-binaries workflow (#101866) A called workflow does not have access to secrets by default, so we need to explicitly pass any secret that we want to use. (cherry picked from commit 1fb1a5d8e2c5a0cbaeb39ead68352e5e55752a6d) * [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761) This PR is the first step in improving the situation for `clang-cl` detailed in [this LLVM Discourse thread](https://discourse.llvm.org/t/clang-cl-exe-support-for-c-modules/72257/28). There has been some work done in #89772. I believe this is somewhat orthogonal. This is a work-in-progress; the functionality has only been tested with the [basic 'Hello World' example](https://clang.llvm.org/docs/StandardCPlusPlusModules.html#quick-start), and proper test cases need to be written. I'd like some thoughts on this, thanks! Partially resolves #64118. (cherry picked from commit bd576fe34285c4dcd04837bf07a89a9c00e3cd5e) * workflows: Fix permissions for release-sources job (#100750) For reusable workflows, the called workflow cannot upgrade it's permissions, and since the default permission is none, we need to explicitly declare 'contents: read' when calling the release-sources workflow. Fixes the error: The workflow is requesting 'contents: read', but is only allowed 'contents: none'. (cherry picked from commit 82c2259aeb87f5cb418decfb6a1961287055e5d2) * [Arm][AArch64][Clang] Respect function's branch protection attributes. (#101978) Default attributes assigned to all functions according to the command line parameters. Some functions might have their own attributes and we need to set or remove attributes accordingly. Tests are updated to test this scenarios too. (cherry picked from commit 9e9fa00dcb9522db3f78d921eda6a18b9ee568bb) * [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338) Remove `XFAIL: LIBCXX-AIX-FIXME` from lit test `transform.pass.cpp` now that AIX system call `wcsxfrm`/`wcsxfrm_l` is fixed in AIX 7.2.5.8 and 7.3.2.2 and buildbot machines have been upgraded. Backported from commit cb5912a71061c6558bd4293596dcacc1ce0ca2f6 * [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC (#102755) Three `llvm-exegesis` tests ``` LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/DefinitionFillsCompletely LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/MultipleDefinitions LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/OneDefinition ``` `FAIL` on Linux/sparc64 like ``` llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp:68: Failure Expected equality of these values: SharedMemoryMapping[I] Which is: '\0' ExpectedValue[I] Which is: '\xAA' (170) ``` It seems like this test only works on little-endian hosts: three sub-tests are already disabled on powerpc and s390x (both big-endian), and the fourth is additionally guarded against big-endian hosts (making the other guards unnecessary). However, since it's not been analyzed if this is really an endianess issue, this patch disables the whole test on powerpc and s390x as before adding sparc to the mix. Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`. (cherry picked from commit a417083e27b155dc92b7f7271c0093aee0d7231c) * [clang-format] Fix a serious bug in `git clang-format -f` (#102629) With the --force (or -f) option, git-clang-format wipes out input files excluded by a .clang-format-ignore file if they have unstaged changes. This patch adds a hidden clang-format option --list-ignored that lists such excluded files for git-clang-format to filter out. Fixes #102459. (cherry picked from commit 986bc3d0719af653fecb77e8cfc59f39bec148fd) * [lldb] Fix crash when adding members to an "incomplete" type (#102116) This fixes a regression caused by delayed type definition searching (#96755 and friends): If we end up adding a member (e.g. a typedef) to a type that we've already attempted to complete (and failed), the resulting AST would end up inconsistent (we would start to "forcibly" complete it, but never finish it), and importing it into an expression AST would crash. This patch fixes this by detecting the situation and finishing the definition as well. (cherry picked from commit 57cd1000c9c93fd0e64352cfbc9fbbe5b8a8fcef) * [clang] Implement -fptrauth-auth-traps. (#102417) This provides -fptrauth-auth-traps, which at the frontend level only controls the addition of the "ptrauth-auth-traps" function attribute. The attribute in turn controls various aspects of backend codegen, by providing the guarantee that every "auth" operation generated will trap on failure. This can either be delegated to the hardware (if AArch64 FPAC is known to be available), in which case this attribute doesn't change codegen. Otherwise, if FPAC isn't available, this asks the backend to emit additional instructions to check and trap on auth failure. (cherry picked from commit d179acd0484bac30c5ebbbed4d29a4734d92ac93) * Revert "[libc++][math] Fix undue overflowing of `std::hypot(x,y,z)` (#93350)" This reverts commit 9628777479a970db5d0c2d0b456dac6633864760. More details in https://github.com/llvm/llvm-project/pull/93350, but this broke the PowerPC sanitizer bots. (cherry picked from commit 1031335f2ee1879737576fde3a3425ce0046e773) * [libc++][math] Fix undue overflowing of `std::hypot(x,y,z)` (#100820) This is in relation to mr #93350. It was merged to main, but reverted because of failing sanitizer builds on PowerPC. The fix includes replacing the hard-coded threshold constants (e.g. `__overflow_threshold`) for different floating-point sizes by a general computation using `std::ldexp`. Thus, it should now work for all architectures. This has the drawback of not being `constexpr` anymore as `std::ldexp` is not implemented as `constexpr` (even though the standard mandates it for C++23). Closes #92782 (cherry picked from commit 72825fde03aab3ce9eba2635b872144d1fb6b6b2) * [C++20] [Modules] Don't diagnose duplicated implicit decl in multiple named modules (#102423) Close https://github.com/llvm/llvm-project/issues/102360 Close https://github.com/llvm/llvm-project/issues/102349 http://eel.is/c++draft/basic.def.odr#15.3 makes it clear that the duplicated deinition are not allowed to be attached to named modules. But we need to filter the implicit declarations as user can do nothing about it and the diagnostic message is annoying. (cherry picked from commit e72d956b99e920b0fe2a7946eb3a51b9e889c73c) * [AIX] Revert `#pragma mc_func` check (#102919) https://github.com/llvm/llvm-project/pull/99888 added a specific diagnostic for `#pragma mc_func` on AIX. There are some disagreements on: 1. If the check should be on by default. Leaving the check off by default is dangerous, since it is difficult to be aware of such a check. Turning it on by default at the moment causes build failures on AIX. See https://github.com/llvm/llvm-project/pull/101336 for more details. 2. If the check can be made more general. See https://github.com/llvm/llvm-project/pull/101336#issuecomment-2269283906. This PR reverts this check from `main` so we can flush out these disagreements. (cherry picked from commit 123b6fcc70af17d81c903b839ffb55afc9a9728f) * [Clang][Sema] Make UnresolvedLookupExprs in class scope explicit specializations instantiation dependent (#100392) A class member named by an expression in a member function that may instantiate to a static _or_ non-static member is represented by a `UnresolvedLookupExpr` in order to defer the implicit transformation to a class member access expression until instantiation. Since `ASTContext::getDecltypeType` only creates a `DecltypeType` that has a `DependentDecltypeType` as its canonical type when the operand is instantiation dependent, and since we do not transform types unless they are instantiation dependent, we need to mark the `UnresolvedLookupExpr` as instantiation dependent in order to correctly build a `DecltypeType` using the expression as its operand with a `DependentDecltypeType` canonical type. Fixes #99873. (cherry picked from commit 55ea36002bd364518c20b3ce282640c920697bf7) * [libc++] Use a different smart ptr type alias (#102089) The `_SP` type is used by some C libraries and this alias could conflict with it. (cherry picked from commit 7951673d408ee64744d0b924a49db78e8243d876) * [CodeGen][ARM64EC] Define hybrid_patchable EXP thunk symbol as a function. (#102898) This is needed for MSVC link.exe to generate redirection metadata for hybrid patchable thunks. (cherry picked from commit d550ada5ab6cd6e49de71ac4c9aa27ced4c11de0) * [PPC][AIX] Save/restore r31 when using base pointer (#100182) When the base pointer r30 is used to hold the stack pointer, r30 is spilled in the prologue. On AIX registers are saved from highest to lowest, so r31 also needs to be saved. Fixes https://github.com/llvm/llvm-project/issues/96411 (cherry picked from commit d07f106e512c08455b76cc1889ee48318e73c810) * [clang-format] Fix annotation of braces enclosing stringification (#102998) Fixes #102937. (cherry picked from commit ee2359968fa307ef45254c816e14df33374168cd) * [clang][AArch64] Point the nofp ABI check diagnostics at the callee (#103392) ... whereever we have the Decl for it, and even when we don't keep the SourceLocation of it aimed at the call site. Fixes: #102983 (cherry picked from commit 019ef522756886caa258daf68d877f84abc1b878) * [libc++] Fix ambiguous constructors for std::complex and std::optional (#103409) Fixes #101960 (cherry picked from commit 4d08bb11eea5907fa9cdfe4c7bc9d5c91e79c6a7) * [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) Summary: This was not forwarded properly as it would try to pass it to `nvlink`. Fixes https://github.com/llvm/llvm-project/issues/100168 (cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43) * [RISCV] Use experimental.vp.splat to splat specific vector length elements. (#101329) Previously, llvm IR is hard to create a scalable vector splat with a specific vector length, so we use riscv.vmv.v.x and riscv.vmv.v.f to do this work. But the two rvv intrinsics needs strict type constraint which can not support fixed vector types and illegal vector types. Using vp.splat could preserve old functionality and also generate more optimized code for vector types and illegal vectors. This patch also fixes crash for getEVT not serving ptr types. (cherry picked from commit 87af9ee870ad7ca93abced0b09459c3760dec891) * [Hexagon] Do not optimize address of another function's block (#101209) When the constant extender optimization pass encounters an instruction that uses an extended address pointing to another function's block, avoid adding the instruction to the extender list for the current machine function. Fixes https://github.com/llvm/llvm-project/issues/99714 (cherry picked from commit 68df06a0b2998765cb0a41353fcf0919bbf57ddb) * [AArch64] Add GCS release notes * Revert "[CGData] llvm-cgdata (#89884)" This reverts commit d3fb41dddc11b0ebc338a3b9e6a5ab7288ff7d1d and forward fix patches because of the issue explained in: https://github.com/llvm/llvm-project/pull/89884#issuecomment-2244348117. Revert "Fix tests for https://github.com/llvm/llvm-project/pull/89884 (#100061)" This reverts commit 67937a3f969aaf97a745a45281a0d22273bff713. Revert "Fix build break for https://github.com/llvm/llvm-project/pull/89884 (#100050)" This reverts commit c33878c5787c128234d533ad19d672dc3eea19a8. Revert "[CGData] Fix -Wpessimizing-move in CodeGenDataReader.cpp (NFC)" This reverts commit 1f8b2b146141f3563085a1acb77deb50857a636d. (cherry picked from commit 73d78973fe072438f0f73088f889c66845b2b51a) * [AArch64] Adopt updated B16B16 target flags The enablement of SVE/SME non-widening BFloat16 instructions was recently changed in response to an architecture update, in which: - FEAT_SVE_B16B16 was weakened - FEAT_SME_B16B16 was introduced New flags, 'sve-b16b16' and 'sme-b16b16' were introduced to replace the existing 'b16b16'. This was acheived in the below two patches. - https://github.com/llvm/llvm-project/pull/101480 - https://github.com/llvm/llvm-project/pull/102501 Ideally, the interface change introduced here will be valid in LLVM-19. We do not see it necessary to back-port the entire change, but just to add 'sme-b16b16' and 'sve-b16b16' as aliases to the existing (and unchanged) 'b16b16' and 'sme2' flags which together cover all of these features. The predication of Bf16 variants of svmin/svminnm and svmax/svmaxnm is also fixed in this change. * [libc++] Fix rejects-valid in std::span copy construction (#104500) Trying to copy-construct a std::span from another std::span holding an incomplete type would fail as we evaluate the SFINAE for the range-based constructor. The problem was that we checked for __is_std_span after checking for the range being a contiguous_range, which hard-errored because of arithmetic on a pointer to incomplete type. As a drive-by, refactor the whole test and format it. Fixes #104496 (cherry picked from commit 99696b35bc8a0054e0b0c1a26e8dd5049fa8c41b) * [clang-tidy] Fix crash in C language in readability-non-const-parameter (#100461) Fix crash that happen when redeclaration got different number of parameters than definition. Fixes #100340 (cherry picked from commit a27f816fe56af9cc7f4f296ad6c577f6ea64349f) * [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) Emit an optimization remark when objects in the stack frame may cause hazards in a streaming mode function. The analysis requires either the `aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag to be set by the user, with the former flag taking precedence. (cherry picked from commit a98a0dcf63f54c54c5601a34c9f8c10cde0162d6) * [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605) In C++23 anything can be constexpr, including a dtor of a class whose members and bases don't have constexpr dtors. Avoid early triggering of vtable instantiation int this case. Fixes https://github.com/llvm/llvm-project/issues/102293 (cherry picked from commit d469794d0cdfd2fea50a6ce0c0e33abb242d744c) * [OpenMP][AArch64] Fix branch protection in microtasks (#102317) Start __kmp_invoke_microtask with PACBTI in order to identify the function as a valid branch target. Before returning, SP is authenticated. Also add the BTI and PAC markers to z_Linux_asm.S. With this patch, libomp.so can now be generated with DT_AARCH64_BTI_PLT when built with -mbranch-protection=standard. The implementation is based on the code available in compiler-rt. (cherry picked from commit 0aa22dcd2f6ec5f46b8ef18fee88066463734935) * [clang][driver] `TY_ModuleFile` should be a 'CXX' file type * [Mips] Fix fast isel for i16 bswap. (#103398) We need to mask the SRL result to 8 bits before ORing in the SLL. This is needed in case bits 23:16 of the input aren't zero. They will have been shifted into bits 15:8. We don't need to AND the result with 0xffff. It's ok if the upper 16 bits of the register are garbage. Fixes #103035. (cherry picked from commit ebe7265b142f370f0a563fece5db22f57383ba2d) * Add some brief LLVM 19 release notes for Pointer Authentication ABI support. * release/19.x: [BOLT] Fix relocations handling Backport https://github.com/llvm/llvm-project/commit/097ddd3565f830e6cb9d0bb8ca66844b7f3f3cbb * [AArch64] Add a check for invalid default features (#104435) This adds a check that all ExtensionWithMArch which are marked as implied features for an architecture are also present in the list of default features. It doesn't make sense to have something mandatory but not on by default. There were a number of existing cases that violated this rule, and some changes to which features are mandatory (indicated by the Implies field). This resulted in a bug where if a feature was marked as `Implies` but was not added to `DefaultExt`, then for `-march=base_arch+nofeat` the Driver would consider `feat` to have never been added and therefore would do nothing to disable it (no `-target-feature -feat` would be added, but the backend would enable the feature by default because of `Implies`). See clang/test/Driver/aarch64-negative-modifiers-for-default-features.c. Note that the processor definitions do not respect the architecture DefaultExts. These apply only when specifying `-march=<some architecture version>`. So when a feature is moved from `Implies` to `DefaultExts` on the Architecture definition, the feature needs to be added to all processor definitions (that are based on that architecture) in order to preserve the existing behaviour. I have checked the TRMs for many cases (see specific commit messages) but in other cases I have just kept the current behaviour and not tried to fix it. * [GlobalISel] Bail out early for big-endian (#103310) If we continue through the function we can currently hit crashes. We can bail out early and fall back to SDAG. Fixes #103032 (cherry picked from commit 05d17a1c705e1053f95b90aa37d91ce4f94a9287) * [LLD] [MinGW] Recognize the -rpath option (#102886) GNU ld silently accepts the -rpath option for Windows targets, as a no-op. This has lead to some build systems (and users) passing this option while building for Windows/MinGW, even if Windows doesn't have any concept like rpath. Older versions of Conan did include -rpath in the pkg-config files it generated, see e.g. https://github.com/conan-io/conan/blob/17c58f0c61931f9de218ac571cd97a8e0befa68e/conans/client/generators/pkg_config.py#L104-L114 and https://github.com/conan-io/conan/blob/17c58f0c61931f9de218ac571cd97a8e0befa68e/conans/client/build/compiler_flags.py#L26-L34 - and see https://github.com/mstorsjo/llvm-mingw/issues/300 for user reports about this issue. Recognize the option in LLD for MinGW targets, to improve drop-in compatibility compared to GNU ld, but produce a warning to alert users that the option really has no effect for these targets. (cherry picked from commit 69f76c782b554a004078af6909c19a11e3846415) * [C++23] Fix infinite recursion (Clang 19.x regression) (#104829) d469794d0cdfd2fea50a6ce0c0e33abb242d744c was fixing an issue with triggering vtable instantiations, but it accidentally introduced infinite recursion when the type to be checked is the same as the type used in a base specifier or field declaration. Fixes #104802 (cherry picked from commit 435cb0dc5eca08cdd8d9ed0d887fa1693cc2bf33) * Reland [C++20] [Modules] [Itanium ABI] Generate the vtable in the mod… (#102287) Reland https://github.com/llvm/llvm-project/pull/75912 The differences of this PR between https://github.com/llvm/llvm-project/pull/75912 are: - Fixed a regression in `Decl::isInAnotherModuleUnit()` in DeclBase.cpp pointed by @mizvekov and add the corresponding test. - Fixed the regression in windows https://github.com/llvm/llvm-project/issues/97447. The changes are in `CodeGenModule::getVTableLinkage` from `clang/lib/CodeGen/CGVTables.cpp`. According to the feedbacks from MSVC devs, the linkage of vtables won't affected by modules. So I simply skipped the case for MSVC. Given this is more or less fundamental to the use of modules. I hope we can backport this to 19.x. (cherry picked from commit 847f9cb0e868c8ec34f9aa86fdf846f8c4e0388b) * [libunwind] Add GCS support for AArch64 (#99335) AArch64 GCS (Guarded Control Stack) is similar enough to CET that we can re-use the existing code that is guarded by _LIBUNWIND_USE_CET, so long as we also add defines to locate the GCS stack and pop the entries from it. We also need the jumpto function to exit using br instead of ret, to prevent it from popping the GCS stack. GCS support is enabled using the LIBUNWIND_ENABLE_GCS cmake option. This enables -mbranch-protection=standard, which enables GCS. For the places we need to use GCS instructions we use the target attribute, as there's not a command-line option to enable a specific architecture extension. (cherry picked from commit b32aac4358c1f6639de7c453656cd74fbab75d71) * [libunwind] Be more careful about enabling GCS (#101973) We need both GCS to be enabled by the compiler (which we do by checking if __ARM_FEATURE_GCS_DEFAULT is defined) and for arm_acle.h to define the GCS intrinsics. Check the latter by checking if _CHKFEAT_GCS is defined. (cherry picked from commit c649194a71b47431f2eb2e041435d564e3b51072) * [libunwind] Fix problems caused by combining BTI and GCS (#102322) The libunwind assembly files need adjustment in order to work correctly when both BTI and GCS are both enabled (which will be the case when using -mbranch-protection=standard): * __libunwind_Registers_arm64_jumpto can't use br to jump to the return location, instead we need to use gcspush then ret. * Because we indirectly call __libunwind_Registers_arm64_jumpto it needs to start with bti jc. * We need to set the GCS GNU property bit when it's enabled. --------- Co-authored-by: Daniel Kiss <daniel.kristof.kiss@gmail.com> (cherry picked from commit 39529107b46032ef0875ac5b809ab5b60cd15a40) * Bump version to 19.1.0-rc3 * [sanitizer_common] Make sanitizer_linux.cpp kernel_stat* handling Linux-specific fcd6bd5587cc376cd8f43b60d1c7d61fdfe0f535 broke the Solaris/sparcv9 buildbot: ``` compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:39:14: fatal error: 'asm/unistd.h' file not found 39 | # include <asm/unistd.h> | ^~~~~~~~~~~~~~ ``` That section should have been Linux-specific in the first place, which is what this patch does. Tested on sparcv9-sun-solaris2.11. (cherry picked from commit 16e9bb9cd7f50ae2ec7f29a80bc3b95f528bfdbf) * [clang][modules] Built-in modules are not correctly enabled for Mac Catalyst (#104872) Mac Catalyst is the iOS platform, but it builds against the macOS SDK and so it needs to be checking the macOS SDK version instead of the iOS one. Add tests against a greater-than SDK version just to make sure this works beyond the initially supporting SDKs. (cherry picked from commit b9864387d9d00e1d4888181460d05dbc92364d75) * [SPARC] Remove assertions in printOperand for inline asm operands (#104692) Inline asm operands could contain any kind of relocation, so remove the checks. Fixes https://github.com/llvm/llvm-project/issues/103493 (cherry picked from commit 576b7a781aac6b1d60a72248894b50e565e9185a) * Add AIX/PPC Clang/LLVM release notes for LLVM 19. * use default intrinsic attrs for BPF packet loads The BPF packet load intrinsics lost attribute WillReturn due to 0b20c30. The attribute loss causes excessive bitshifting, resulting in previously working programs failing the BPF verifier due to instruction/complexity limits. cherry picked only the BPF changes from 99a10f1 Signed-off-by: Bryce Kahle <bryce.kahle@datadoghq.com> * [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395) Prevent operand folding from inlining constants into pseudo scalar transcendental f16 instructions. However still allow literal constants. (cherry picked from commit fc6300a5f7ef430e4ec86d16be0b146de7fbd16b) * [DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924) In visitFREEZE we have been collecting a set/vector of MaybePoisonOperands that later was iterated over, applying a freeze to those operands. However, C-level fuzzy testing has discovered that the recursiveness of ReplaceAllUsesOfValueWith may cause later operands in the MaybePoisonOperands vector to be replaced when replacing an earlier operand. That would then turn up as Assertion `N1.getOpcode() != ISD::DELETED_NODE && "Operand is DELETED_NODE!"' failed. failures when trying to freeze those later operands. So we need to make sure that the vector with MaybePoisonOperands is mutated as well when needed. Or as the solution used in this patch, make sure to keep track of operand numbers that should be frozen instead of having a vector of SDValues. And then we can refetch the operands while iterating over operand numbers. The problem was seen after adding SELECT_CC to the set of operations including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I guess that this could happen for other operations as well for which we allow multiple maybe poison operands. (cherry picked from commit 278fc8efdf004a1959a31bb4c208df5ee733d5c8) * [X86] Use correct fp immediate types in _mm_set_ss/sd Avoids implicit sint_to_fp which wasn't occurring on strict fp codegen Fixes #104848 (cherry picked from commit 6dcce422ca06601f2b00e85cc18c745ede245ca6) * [clang-format] Don't insert a space between :: and * (#105043) Also, don't insert a space after ::* for method pointers. See https://github.com/llvm/llvm-project/pull/86253#issuecomment-2298404887. Fixes #100841. (cherry picked from commit 714033a6bf3a81b1350f969ddd83bcd9fbb703e8) * [ConstraintElim] Fix miscompilation caused by PR97974 (#105790) Fixes https://github.com/llvm/llvm-project/issues/105785. (cherry picked from commit 85b6aac7c25f9d2a976a76045ace1e7afebb5965) * [MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data (REAPPLIED) This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should Reapplied with codegen fix for scatter-schedule.ll Fixes #105675 (cherry picked from commit cf6cd1fd67356ca0c2972992928592d2430043d2) * [DwarfEhPrepare] Assign dummy debug location for more inserted _Unwind_Resume calls (#105513) Similar to the fix for #57469, ensure that the other `_Unwind_Resume` call emitted by DwarfEHPrepare has a debug location if needed. This fixes https://github.com/nbdd0121/unwinding/issues/34. (cherry picked from commit e76db25832d6ac2d3a36769b26f982d9dee4b346) * [clangd] Add clangd 19 release notes * Restrict LLVM_TARGETS_TO_BUILD in Windows release packaging (#106059) When including all targets, some files become too large for the NSIS installer to handle. Fixes #101994 (cherry picked from commit 2a28df66dc3f7ff5b6233241837854acefb68d77) * [AArch64] Make apple-m4 armv8.7-a again (from armv9.2-a). (#106312) This is a partial revert of c66e1d6f3429. Even though that allowed us to declare v9.2-a support without picking up SVE2 in both the backend and the driver, the frontend itself still enabled SVE via the arch version's default extensions. Avoid that by reverting back to v8.7-a while we look into longer-term solutions. (cherry picked from commit e5e38ddf1b8043324175868831da21e941c00aff) * workflows/release-binaries: Remove .git/config file from artifacts (#106310) The .git/config file contains an auth token that can be leaked if the .git directory is included in a workflow artifact. (cherry picked from commit ef50970204384643acca42ba4c7ca8f14865a0c2) * [clang] Install scan-build-py into plain "lib" directory (#106612) Install scan-build-py modules into the plain `lib` directory, without LLVM_LIBDIR_SUFFIX appended, to match the path expected by `intercept-build` executable. This fixes the program being unable to find its modules. Using unsuffixed path makes sense here, since Python modules are not subject to multilib. This change effectively reverts 1334e129a39cb427e7b855e9a711a3e7604e50e5. The commit in question changed the path without a clear justification ("does not respect the given prefix") and the Python code was never modified to actually work with the change. Fixes #106608 (cherry picked from commit 0c4cf79defe30d43279bf4526cdf32b6c7f8a197) * [llvm][CodeGen] Added missing initialization failure information for window scheduler (#99449) Added missing initialization failure information for window scheduler. * [llvm][CodeGen] Added a new restriction for II by pragma in window scheduler (#99448) Added a new restriction for window scheduling. Window scheduling is disabled when llvm.loop.pipeline.initiationinterval is set. * [llvm][CodeGen] Fixed a bug in stall cycle calculation for window scheduler (#99451) Fixed a bug in stall cycle calculation. When a register defined by an instruction in the current iteration is used by an instruction in the next iteration, we have modified the number of stall cycle that need to be inserted. * [llvm][CodeGen] Fixed max cycle calculation with zero-cost instructions for window scheduler (#99454) We discovered some scheduling failures occurring when zero-cost instructions were involved. This issue will be addressed by this patch. * [llvm][CodeGen] Address the issue of multiple resource reservations In window scheduling (#101665) Address the issue of multiple resource reservations in window scheduling. * [analyzer] Limit `isTainted()` by skipping complicated symbols (#105493) As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates #89720 CPP-5414 (cherry picked from commit 848658955a9d2d42ea3e319d191e2dcd5d76c837) * [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081) A crash was happening when both ObjC Category Merging and Relative method lists were enabled. ObjC Category Merging creates new data sections and adds them by calling `addInputSection`. `addInputSection` uses the symbols within the added section to determine which container to actually add the section to. The issue is that ObjC Category merging is calling `addInputSection` before actually adding the relevant symbols the the added section. This causes `addInputSection` to add the `InputSection` to the wrong container, eventually resulting in a crash. To fix this, we ensure that ObjC Category Merging calls `addInputSection` only after the symbols have been added to the `InputSection`. (cherry picked from commit 0df91893efc752a76c7bbe6b063d66c8a2fa0d55) * [PowerPC] Respect endianness when bitcasting to fp128 (#95931) Fixes #92246 Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`. by inserting a missing swap of the arguments based on endianness. ### Current behaviour: **fp128** bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi BUILD_FP128 %lo %hi => MTVSRDD %hi %lo **v2i64** bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi (cherry picked from commit 408d82d352eb98e2d0a804c66d359cd7a49228fe) * Add release note about ABI implementation changes for _BitInt on Arm * [AMDGPU] Add GFX12 test coverage for vmcnt flushing in loop headers (#105548) (cherry picked from commit 61194617ad7862f144e0f6db34175553e8c34763) * [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (#105549) Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies. (cherry picked from commit 5506831f7bc8dc04ebe77f4d26940007bfb4ab39) * [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. (cherry picked from commit fa2dccb377d0b712223efe5b62e5fc633580a9e6) * [libunwind] Stop installing the mach-o module map (#105616) libunwind shouldn't know that compact_unwind_encoding.h is part of a MachO module that it doesn't own. Delete the mach-o module map, and let whatever is in charge of the mach-o directory be the one to say how its module is organized and where compact_unwind_encoding.h fits in. (cherry picked from commit 172c4a4a147833f1c08df1555f3170aa9ccb6cbe) * [clang-format] Fix a misannotation of redundant r_paren as CastRParen (#105921) Fixes #105880. (cherry picked from commit 6bc225e0630f28e83290a43c3d9b25b057fc815a) * [clang-format] Fix a misannotation of less/greater as angle brackets (#105941) Fixes #105877. (cherry picked from commit 0916ae49b89db6eb9eee9f6fee4f1a65fd9cdb74) * [PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453) These builtins are currently returning CR0 which will have the format [0, 0, flag_true_if_saved, XER]. We only want to return flag_true_if_saved. This patch adds a shift to remove the XER bit before returning. (cherry picked from commit 327edbe07ab4370ceb20ea7c805f64950871d835) * [clang][AArch64] Add SME2.1 feature macros (#105657) (cherry picked from commit 2617023923175b0fd2a8cb94ad677c061c01627f) * [Clang][Sema] Revisit the fix for the lambda within a type alias template decl (#89934) In the last patch #82310, we used template depths to tell if such alias decls contain lambdas, which is wrong because the lambda can also appear as a part of the default argument, and that would make `getTemplateInstantiationArgs` provide extra template arguments in undesired contexts. This leads to issue #89853. Moreover, our approach for https://github.com/llvm/llvm-project/issues/82104 was sadly wrong. We tried to teach `DeduceReturnType` to consider alias template arguments; however, giving these arguments in the context where they should have been substituted in a `TransformCallExpr` call is never correct. This patch addresses such problems by using a `RecursiveASTVisitor` to check if the lambda is contained by an alias `Decl`, as well as twiddling the lambda dependencies - we should also build a dependent lambda expression if the surrounding alias template arguments were dependent. Fixes #89853 Fixes #102760 Fixes #105885 (cherry picked from commit b412ec5d3924c7570c2c96106f95a92403a4e09b) * [libc++] Add missing include to three_way_comp_ref_type.h We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`. rdar://134425695 (cherry picked from commit 0df78123fdaed39d5135c2e4f4628f515e6d549d) * [compiler-rt] Fix definition of `usize` on 32-bit Windows 32-bit Windows uses `unsigned int` for uintptr_t and size_t. Commit 18e06e3e2f3d47433e1ed323b8725c76035fc1ac changed uptr to unsigned long, so it no longer matches the real size_t/uintptr_t and therefore the current definition of usize result in: `error C2821: first formal parameter to 'operator new' must be 'size_t'` However, the real problem is that uptr is wrong to work around the fact that we have local SIZE_T and SSIZE_T typedefs that trample on the basetsd.h definitions of the same name and therefore need to match exactly. Unlike size_t/ssize_t the uppercase ones always use unsigned long (even on 32-bit). This commit works around the build breakage by keeping the existing definitions of uptr/sptr and just changing usize. A follow-up change will attempt to fix this properly. Fixes: https://github.com/llvm/llvm-project/issues/101998 Reviewed By: mstorsjo Pull Request: https://github.com/llvm/llvm-project/pull/106151 (cherry picked from commit bb27dd853a713866c025a94ead8f03a1e25d1b6e) * [clang-format] Fix misalignments of pointers in angle brackets (#106013) Fixes #105898. (cherry picked from commit 656d5aa95825515a55ded61f19d41053c850c82d) * [clang-format] js handle anonymous classes (#106242) Addresses a regression in JavaScript when formatting anonymous classes. --------- Co-authored-by: Owen Pan <owenpiano@gmail.com> (cherry picked from commit 77d63cfd18aa6643544cf7acd5ee287689d54cca) * Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb. Fixes: https://github.com/llvm/llvm-project/issues/100212 (cherry picked from commit 030ee841a9c9fbbd6e7c001e751737381da01f7b) Conflicts: clang/test/Driver/linker-wrapper-passes.c * [clang-format] Revert "[clang-format][NFC] Delete TT_LambdaArrow (#70… (#105923) …519)" This reverts commit e00d32afb9d33a1eca48e2b041c9688436706c5b and adds a test for lambda arrow SplitPenalty. Fixes #105480. * workflows/release-tasks: Pass required secrets to all called workflows (#106286) Called workflows don't have access to secrets by default, so we need to explicitly pass secrets that we use. (cherry picked from commit 9d81e7e36e33aecdee05fef551c0652abafaa052) * [C++20] [Modules] Don't insert class not in named modules to PendingEmittingVTables (#106501) Close https://github.com/llvm/llvm-project/issues/102933 The root cause of the issue is an oversight in https://github.com/llvm/llvm-project/pull/102287 that I didn't notice that PendingEmittingVTables should only accept classes in named modules. (cherry picked from commit 47615ff2347a8be429404285de3b1c03b411e7af) * Revert "[clang] fix broken canonicalization of DeducedTemplateSpecializationType (#95202)" This reverts commit 2e1ad93961a3f444659c5d02d800e3144acccdb4. Reverting #95202 in the 19.x branch Fixes #106182 The change in #95202 causes code to crash and there is no good way to backport a fix for that as there are ABI-impacting changes at play. Instead we revert #95202 in the 19x branch, fixing the regression and preserving the 18.x behavior (which is GCC's behavior) https://github.com/llvm/llvm-project/pull/106335#discussion_r1735174841 * [analyzer] Add missing include <unordered_map> to llvm/lib/Support/Z3Solver.cpp (#106410) Resolves #106361. Adding #include <unordered_map> to llvm/lib/Support/Z3Solver.cpp fixes compilation errors for homebrew build on macOS with Xcode 14. https://github.com/Homebrew/homebrew-core/actions/runs/10604291631/job/29390993615?pr=181351 shows that this is resolved when the include is patched in (Linux CI failure is due to unrelated timeout). (cherry picked from commit fcb3a0485857c749d04ea234a8c3d629c62ab211) * [RemoveDIs] Simplify spliceDebugInfo, fixing splice-to-end edge case (#105670) Not quite NFC, fixes splitBasicBlockBefore case when we split before an instruction with debug records (but without the headBit set, i.e., we are splitting before the instruction but after the debug records that come before it). splitBasicBlockBefore splices the instructions before the split point into a new block. Prior to this patch, the debug records would get shifted up to the front of the spliced instructions (as seen in the modified unittest - I believe the unittest was checking erroneous behaviour). We instead want to leave those debug records at the end of the spliced instructions. The functionality of the deleted `else if` branch is covered by the remaining `if` now that `DestMarker` is set to the trailing marker if `Dest` is `end()`. Previously the "===" markers were sometimes detached, now we always detach them and always reattach them. Note: `deleteTrailingDbgRecords` only "unlinks" the tailing marker from the block, it doesn't delete anything. The trailing marker is still cleaned up properly inside the final `if` body with `DestMarker->eraseFromParent();`. Part 1 of 2 needed for #105571 (cherry picked from commit f5815534d180c544bffd46f09c28b6fc334260fb) * [libcxx] don't `#include <cwchar>` if wide chars aren't enabled (#99911) Pull request #96032 unconditionall adds the `cwchar` include in the `format` umbrella header. However support for wchar_t can be disabled in the build system (LIBCXX_ENABLE_WIDE_CHARACTERS). This patch guards against inclusion of `cwchar` in `format` by checking the `_LIBCPP_HAS_NO_WIDE_CHARACTERS` define. For clarity I've also merged the include header section that `cwchar` was in with the one above as they were both guarded by the same `#if` logic. (cherry picked from commit ec56790c3b27df4fa1513594ca9a74fd8ad5bf7f) * [clang-format] Correctly annotate braces in ObjC square brackets (#106654) See https://github.com/llvm/llvm-project/pull/88238#issuecomment-2316954781. (cherry picked from commit e0f2368cdeb7312973a92fb2d22199d1de540db8) * [Instrumentation] Fix EdgeCounts vector size in SetBranchWeights (#99064) (cherry picked from commit 46a4132e167aa44d8ec7776262ce2a0e6d47de59) * [builtins] Fix missing main() function in float16/bfloat16 support checks (#104478) The CMake docs state that `check_c_source_compiles()` checks whether the supplied code "can be compiled as a C source file and linked as an executable (so it must contain at least a `main()` function)." https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html In practice, this command is a wrapper around `try_compile()`: - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/CheckCSourceCompiles.cmake#L54 - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/Internal/CheckSourceCompiles.cmake#L101 When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/, `CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `ST…
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.
Update getPtrStride to return 0 for invariant pointers.
Then proceed by checking the strides.
If either source or sink are not strided by a constant (i.e. not a non-wrapping
AddRec) or invariant, the accesses may overlap with earlier or later iterations
and we cannot generate runtime checks to disambiguate them.
Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.
If both are strided by constants, we proceed as previously.
This is an alternative to
#99239 and also replaces
additional checks if the underlying object is loop-invariant.
Fixes #87189.