Merge upstream/main into amd-trunk-dev (17.12.2024) #232

ergawy · 2024-12-17T08:47:46Z

No description provided.

I am trying to switch to keeping the reduction value in a temporary scalar location so that I can use hlfir::genLoopNest easily. This also allows using omp.loop_nest with worksharing for OpenMP.

This is a NFC change. Remove duplicated test line in gfx11/gfx12 vop1 test file with the latest update_mc_test_script.py --unique option This is also preparing for the up-coming true16 change

) This change affects non-relocation mode only. Prior to having CheckLargeFunctions pass, we could have emitted code for functions that was discarded at the end due to size limitations. Since we didn't know at the time of emission if the code would be discarded or not, we had to emit jump tables in separate sections and handle them separately. However, now we always run CheckLargeFunctions and make sure all emitted code is used. Thus, we can get rid of the special jump table handling.

…lvm#119776) Mask only instructions like vmand and vmsbf should always have 0 for their Log2SEW operand. Non-mask instructions should only have 3, 4, 5, or 6 for their Log2SEW operand. Split the operand type so we can verify these cases separately. I had to fix the SEW for whole register move to vmv.v.v copy optimization and update an mir test. The vmv.v.v change isn't functional since we have already done vsetvli insertion before and nothing else uses the field after copy expansion. I can split these changes off if desired.

…atterns (llvm#119795)

…97158) Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to `AllInstsEqOnly` and always allow hoisting if all instructions match. In that case, all instructions can be hoisted and the original branch will be replaced and selects for PHIs are added. This allows preserving metadata in more cases, using the existing hoisting logic, whereas previously FoldTwoEntryPHINode would drop the metadata. https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u PR: llvm#97158

llvm#119920) This refactoring will allow to make this function weak later on so that it could be overloaded by a client. See llvm#119242.

…lvm#119919) Fix the condition so the implicit device data attribute is not applied when the routine has `attribute(host)`

This is another new clause specific to 'exit data' that takes a pointer argument. This patch implements this the same way we do a few other clauses (like attach) that have the same restrictions.

In these tests, we just want to add one instance of IndexedMemProfRecord to MemProfData.Records and retrieve it from MemProfReader. There is no particular reason to associate F1.hash() with the IndexedMemProfRecord instance. A fake value suffices. While I am at it, I'm switching to try_emplace so that I can move FakeRecord.

…ate (NFC) (llvm#119831) This patch makes the following functions private: - InstrProfWriter::addMemProfRecord - InstrProfWriter::addMemProfFrame - InstrProfWriter::addMemProfCallStack These days, we add MemProf profile to the writer context via addMemProfData. We no longer add individual items.

Strip hash_value() for CmpPredicate, as different callers have different hashing use-cases. In this case, there is just one caller, namely EarlyCSE, which calls hash_combine() on a CmpPredicate, which used to call hash_combine() on a CmpInst::Predicate prior to 4a0d53a (PatternMatch: migrate to CmpPredicate). This has uncovered a bug where two icmp instructions differing in just the fact that one of them has the samesign flag on it are hashed differently, leading to divergent hashing, and a crash. Fix this crash by dropping samesign information on icmp instructions before hashing them, preserving the former behavior. Fixes llvm#119893.

…9886)

…VALUE (llvm#119927) Dummy arguments with the VALUE attribute do not need the implicit data attribute.

…ing in client code (llvm#119242)

…m#119430) Resolves llvm#118844

…m#119642) Update integer range narrowing to handle negative values. The previous restriction to only narrowing known-non-negative values wasn't needed, as both the signed and unsigned ranges represent bounds on the values of each variable in the program ... except that one might be more accurate than the other. So, if either the signed or unsigned interpretetation of the inputs and outputs allows for integer narrowing, the narrowing is permitted. This commit also updates the integer optimization rewrites to preserve the stae of constant-like operations and those that are narrowed so that rewrites of other operations don't lose that range information.

…119759) Currently, when dumping the contents of a GSYM there are three issues: - Callsite information is not displayed for merged functions - this is because of a bug in `CallSiteInfoLoader::buildFunctionMap` where when enumerating through `Func.MergedFunctions` - we enumerate by value instead of by reference. - There is no variable indent for printing callsite info - meaning that when printing callsites for merged functions, the indent will be different than the other info of the merged function. To address this we add configurable indent for printing callsite info - Callsite info is printed right after merged function info. Meaning that if the merged function also has call site information, the parent's callsite info will appear right after the merged function's callsite info - leading to confusion. To address this we print the callsite info first, then the merged functions info. This change addresses all the above 3 issues. Example of old vs new: <img width="1074" alt="image" src="https://github.com/user-attachments/assets/d039ad69-fa79-4abb-9816-eda9cc2eda53" />

… message (llvm#119726)

This was preventing the containers from being pushed to the registry.

The windows container push was not tested in the pull request and had a couple of typos that prevented it from functioning. This patch fixes that so we can actually push the container to GHCR.

and forward it to LinkerDriver's ctor so that some uses of the global `config` can be dropped. This is similar to how the ELF port migrates away from the global `config`. Pull Request: llvm#119829

…lvm#119687)" Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio This reverts commit 7a64855.

Around shifting negative values.

This reverts commit 49c2207. This breaks on big-endian, again: https://lab.llvm.org/buildbot/#/builders/154/builds/9018

@nicovank

…#119938) Uppercase each word in title and toctree _Originally posted by @nicovank in llvm#119842 (comment). --------- Co-authored-by: Nicolas van Kempen <nvankemp@gmail.com>

Reverts llvm#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.

This patch essentially replaces: std::pair<const std::vector<Frame> *, unsigned> with: ArrayRef<Frame> This way, we can store and pass ArrayRef<Frame>, conceptually one item, instead of the pointer and index. The only problem is that we don't have an existing hash function for ArrayRef<Frame>>, so we provide a custom one, namely CallStackHash.

…19957) The macho-gsym-merged-callsites-dsym is failing on some hosts. Disabling for now while we come up with a fix.

This patch sets the default user in the linux CI container to a non-root user, which enables properly testing a couple of features, particularly in libcxx.

- Put the element size field in the same place for all non-pointer types. - Put the element size and address space fields in the same place for all pointer types. - Put the number of elements and scalable fields in the same place for all vector types. This simplifies initialization and accessor methods isScalable, getElementCount, getScalarSizeInBits and getAddressSpace.

FreeListHeap uses the _end symbol which conflicts with the _end symbol defined by GPU start.cpp files so for now we exclude the test and the fuzzer on GPU.

This patch adds a Github Actions workflow for Linux premerge. This currently just calls into the existing CI scripts as a starting point.

) Now that variables have implicit attribute, we can check for illegal use of module host variable in device context.

…vm#118549) This patch implements the following intrinsics: Multi-vector 8-bit floating-point multiply-add long. ``` c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, uint64_t imm_idx fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); ``` In accordance with: ARM-software/acle#323

…vm#118624) Summary: This and previously extracted `CloneFunction*Into` functions will be used in later diffs. Test Plan: ninja check-llvm-unit check-llvm

I noticed this while working on something else, these are supposed to be privately inherited.

…base API (llvm#115752) This patch reimplements the locale base support for Windows flavors in a way that is more modules-friendly and without defining non-internal names. Since this changes the name of some types and entry points in the built library, this is effectively an ABI break on Windows (which is acceptable after checking with the Windows/libc++ maintainers).

Instead of storing an auxilliary structure with the information from the DXIL resource target extension types duplicated, access the information that we can via the type itself. This also means we need to handle some of the target extension types we haven't fully defined yet, like Texture and CBuffer. For now we make an educated guess to what those should look like based on llvm/wg-hlsl#76, and we can update them fairly easily when we've defined them more thoroughly. First part of llvm#118400

This patch enables the new premerge workflow postcommit so that we can start testing it at a reasonable scale with minimal disruption.

Creates a new toctree "Support" under which we have distinct links to arch, platform, and compiler support. * Moved "Platform Support" from index landing page to new doc. * Created explicit "Architecture Support". Requested in llvm#118964 (comment). * Moved "Compiler Support" from Status toctree to new Support toctree. --------- Co-authored-by: Carlo Cabrera <github@carlo.cab>

This is causing mis-compiles when in SPEC2017 on AArch64 after b3cba9b.

Update VPReductionPHIRecipe::execute to use the start value from the start value operand of the recipe. This is needed to make sure we resume from the correct value during epilogue vectorization. At the moment, the start value is set to the sentinel value in adjustRecipesForReductions, as the original start value needs to be used when creating ResumePhi recipes. Fixes a mis-compile introduced by b3cba9b in SPEC2017 on AArch64.

Fix issue introduced by llvm#118839.

Resolves llvm#99161 - [x] Implement `WaveActiveAllTrue` clang builtin, - [x] Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WaveActiveAllTrue` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl` - [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114` in `DXIL.td` - [x] Create the `WaveActiveAllTrue.ll` and `WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in `IntrinsicsSPIRV.td` - [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue` lowering and map it to `int_spv_WaveActiveAllTrue` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`

…nt and use it as linkLibs for ModuleToObject (llvm#120116) This change allows to expose through an interface attributes wrapping content as external resources, and the usage inside the ModuleToObject show how we will be able to provide runtime libraries without relying on the filesystem.

…lvm#117487) Essentially, this makes this ill-formed: ```c++ using mat4 = _BitInt(12) [[clang::matrix_type(3, 3)]]; ``` This matches preexisting behaviour for vector types (e.g. `ext_vector_type`), and given that LLVM IR intrinsics for matrices also take vector types, it seems like a sensible thing to do. This is currently especially problematic since we sometimes lower matrix types to LLVM array types instead, and while e.g. `[4 x i32]` and `<4 x i32>` *probably* have the same similar memory layout (though I don’t think it’s sound to rely on that either, see llvm#117486), `[4 x i12]` and `<4 x i12>` definitely don’t.

/llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:19441:17: error: unused variable 'Ty' [-Werror,-Wunused-variable] llvm::Type *Ty = Op->getType(); ^ 1 error generated.

According to https://docs.github.com/en/rest/using-the-rest-api/github-event-types?apiVersion=2022-11-28, When we look at the push event payload, github.event.push.head is a string containing the SHA. This is currently causing new commits on main to cancel the premerge pipeline of older commits.

Remove unused collection of context size information that was likely leftover from debugging / testing.

…m#120039) VPInstruction has a definition of mayWriteToMemory, which seems to only be used by VPlanSLP. However VPInstructions are already handled in VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this definition. I think these should be the same for all intents and purposes. The VPRecipeBase definition is more conservative but returns true for stores/calls/invokes/SLPStores.

) CodeGen will allocate memory for a new descriptor on descriptor loads. CUDA Fortran local descriptor are allocated in managed memory by the runtime. The newly allocated storage for cuda descriptor must also be allocated through the runtime.

This follows GCC behavior of allowing a trailing immediate, that is ignored by the assembler.

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

Fix bazel build after llvm#120116

vzakhari and others added 30 commits December 13, 2024 13:08

[flang] Simplify hlfir.sum total reductions. (llvm#119482)

a00946f

I am trying to switch to keeping the reduction value in a temporary scalar location so that I can use hlfir::genLoopNest easily. This also allows using omp.loop_nest with worksharing for OpenMP.

[AMDGPU][true16] [MC] Remove duplication in VOP1 test (llvm#119905)

2daadbd

This is a NFC change. Remove duplicated test line in gfx11/gfx12 vop1 test file with the latest update_mc_test_script.py --unique option This is also preparing for the up-coming true16 change

AMDGPU: Remove large, negative AddedComplexity from minimum/maximum p…

5f72f2c

…atterns (llvm#119795)

[nfc][ubsan-minimal] Refactor error reporting to use a single function (

e5e0f23

llvm#119920) This refactoring will allow to make this function weak later on so that it could be overloaded by a client. See llvm#119242.

[flang][cuda] Apply implicit data attribute only in device context (l…

3273d0b

…lvm#119919) Fix the condition so the implicit device data attribute is not applied when the routine has `attribute(host)`

[OpenACC] implement 'detach' clause sema

3351b3b

This is another new clause specific to 'exit data' that takes a pointer argument. This patch implements this the same way we do a few other clauses (like attach) that have the same restrictions.

[RISCV][GISel] Added GISelPredicateCodes to LeadingOnes*Mask (llvm#11…

537e0e1

…9886)

[flang][cuda] Do not apply implicit data attribute on dummy arg with …

1345ee4

…VALUE (llvm#119927) Dummy arguments with the VALUE attribute do not need the implicit data attribute.

[ubsan-minimal] Switch to weak symbols for callbacks to allow overrid…

71d2fa7

…ing in client code (llvm#119242)

[DAG] SDPatternMatch - Add m_ExtractElt and m_InsertElt matchers (llv…

ecdf0da

…m#119430) Resolves llvm#118844

[lldb] Support zero-padding in formatter sections (llvm#119934)

f22cff7

[lld/COFF] Demangle symbol name in discarded section relocation error…

d73ef97

… message (llvm#119726)

workflows/build-ci-container: Fix typos in variables (llvm#119943)

22266bc

This was preventing the containers from being pushed to the registry.

[Github] Fix windows container push (llvm#119916)

af20aff

The windows container push was not tested in the pull request and had a couple of typos that prevented it from functioning. This patch fixes that so we can actually push the container to GHCR.

[lld][WebAssembly] Introduce Ctx::arg

a222d00

and forward it to LinkerDriver's ctor so that some uses of the global `config` can be dropped. This is similar to how the ELF port migrates away from the global `config`. Pull Request: llvm#119829

Revert "[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (l…

e821f64

…lvm#119687)" Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio This reverts commit 7a64855.

[clang][bytecode] Fix some shift edge cases (llvm#119895)

49c2207

Around shifting negative values.

Revert "[clang][bytecode] Fix some shift edge cases (llvm#119895)"

a6636ce

This reverts commit 49c2207. This breaks on big-endian, again: https://lab.llvm.org/buildbot/#/builders/154/builds/9018

[clang-tidy][doc] align the title style in clang-tidy/index.rst (llvm…

2291e5a

…#119938) Uppercase each word in title and toctree _Originally posted by @nicovank in llvm#119842 (comment). --------- Co-authored-by: Nicolas van Kempen <nvankemp@gmail.com>

[llvm-gsymutil] Disable test macho-gsym-merged-callsites-dsym (llvm#1…

74fb992

…19957) The macho-gsym-merged-callsites-dsym is failing on some hosts. Disabling for now while we come up with a fix.

boomanaiden154 and others added 29 commits December 16, 2024 13:22

[Github] Default to non-root user in linux CI container (llvm#119987)

b86a22a

This patch sets the default user in the linux CI container to a non-root user, which enables properly testing a couple of features, particularly in libcxx.

[libc] Exclude FreeListHeap test and fuzzer on GPU (llvm#120137)

51a0919

FreeListHeap uses the _end symbol which conflicts with the _end symbol defined by GPU start.cpp files so for now we exclude the test and the fuzzer on GPU.

[CI][Github] Add linux premerge workflow (llvm#119635)

484a281

This patch adds a Github Actions workflow for Linux premerge. This currently just calls into the existing CI scripts as a starting point.

[flang][cuda] Check for use of host array in device context (llvm#119756

67ae944

) Now that variables have implicit attribute, we can check for illegal use of module host variable in device context.

[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto (ll…

8402a0f

…vm#118624) Summary: This and previously extracted `CloneFunction*Into` functions will be used in later diffs. Test Plan: ninja check-llvm-unit check-llvm

[OpenACC/NFC] Make 'trailing objects' use private inheritence.

8c16323

I noticed this while working on something else, these are supposed to be privately inherited.

[gn build] Port 084309a

46bbd2c

[Github] Enable new premerge workflow postcommit

a8456c9

This patch enables the new premerge workflow postcommit so that we can start testing it at a reasonable scale with minimal disruption.

[LV] Add test showing bug in epilogue vectorization of selects.

0f6d93f

This is causing mis-compiles when in SPEC2017 on AArch64 after b3cba9b.

Update BUILD.bazel

dda1d16

Fix issue introduced by llvm#118839.

[clang] Fix -Wunused-variable in CGBuiltin.cpp (NFC)

a176669

/llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:19441:17: error: unused variable 'Ty' [-Werror,-Wunused-variable] llvm::Type *Ty = Op->getType(); ^ 1 error generated.

[MemProf] Remove dead code (NFC) (llvm#120156)

bf700c3

Remove unused collection of context size information that was likely leftover from debugging / testing.

[SPARC][IAS] Add support for call dest, imm form (llvm#119078)

ad64946

This follows GCC behavior of allowing a trailing immediate, that is ignored by the assembler.

[MLIR][NVVM] Enable import of nvvm.barrier0 (llvm#119965)

2806705

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

Update BUILD.bazel

e2a94a9

Fix bazel build after llvm#120116

Merge remote-tracking branch 'upstream/main' into merge_17.12.2024

c5cda76

Post-merge fixes

1c396f0

ergawy merged commit 07c236d into ROCm:amd-trunk-dev Dec 17, 2024
49 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream/main into amd-trunk-dev (17.12.2024) #232

Merge upstream/main into amd-trunk-dev (17.12.2024) #232

ergawy commented Dec 17, 2024

Merge upstream/main into amd-trunk-dev (17.12.2024) #232

Merge upstream/main into amd-trunk-dev (17.12.2024) #232

Conversation

ergawy commented Dec 17, 2024