Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream/main into amd-trunk-dev (17.12.2024) #232

Merged
merged 1,255 commits into from
Dec 17, 2024

Conversation

ergawy
Copy link

@ergawy ergawy commented Dec 17, 2024

No description provided.

vzakhari and others added 30 commits December 13, 2024 13:08
I am trying to switch to keeping the reduction value in a temporary
scalar location so that I can use hlfir::genLoopNest easily.
This also allows using omp.loop_nest with worksharing for OpenMP.
This is a NFC change. Remove duplicated test line in gfx11/gfx12 vop1
test file with the latest update_mc_test_script.py --unique option

This is also preparing for the up-coming true16 change
)

This change affects non-relocation mode only. Prior to having
CheckLargeFunctions pass, we could have emitted code for functions that
was discarded at the end due to size limitations. Since we didn't know
at the time of emission if the code would be discarded or not, we had to
emit jump tables in separate sections and handle them separately.
However, now we always run CheckLargeFunctions and make sure all emitted
code is used. Thus, we can get rid of the special jump table handling.
…lvm#119776)

Mask only instructions like vmand and vmsbf should always have 0 for
their Log2SEW operand.  Non-mask instructions should only have
3, 4, 5, or 6 for their Log2SEW operand.

Split the operand type so we can verify these cases separately.

I had to fix the SEW for whole register move to vmv.v.v copy
optimization and update an mir test. The vmv.v.v change isn't
functional since we have already done vsetvli insertion before and
nothing else uses the field after copy expansion. I can split these
changes off if desired.
…97158)

Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to
`AllInstsEqOnly` and always allow hoisting if all instructions match.

In that case, all instructions can be hoisted and the
original branch will be replaced and selects for PHIs are added. This
allows preserving metadata in more cases, using the existing hoisting
logic, whereas previously FoldTwoEntryPHINode would drop the metadata.


https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u

PR: llvm#97158
llvm#119920)

This refactoring will allow to make this function weak later on so that
it could be overloaded by a client. See llvm#119242.
…lvm#119919)

Fix the condition so the implicit device data attribute is not applied
when the routine has `attribute(host)`
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
In these tests, we just want to add one instance of
IndexedMemProfRecord to MemProfData.Records and retrieve it from
MemProfReader.  There is no particular reason to associate F1.hash()
with the IndexedMemProfRecord instance.  A fake value suffices.

While I am at it, I'm switching to try_emplace so that I can move
FakeRecord.
…ate (NFC) (llvm#119831)

This patch makes the following functions private:

- InstrProfWriter::addMemProfRecord
- InstrProfWriter::addMemProfFrame
- InstrProfWriter::addMemProfCallStack

These days, we add MemProf profile to the writer context via
addMemProfData.  We no longer add individual items.
Strip hash_value() for CmpPredicate, as different callers have different
hashing use-cases. In this case, there is just one caller, namely
EarlyCSE, which calls hash_combine() on a CmpPredicate, which used to
call hash_combine() on a CmpInst::Predicate prior to 4a0d53a
(PatternMatch: migrate to CmpPredicate). This has uncovered a bug where
two icmp instructions differing in just the fact that one of them has
the samesign flag on it are hashed differently, leading to divergent
hashing, and a crash. Fix this crash by dropping samesign information on
icmp instructions before hashing them, preserving the former behavior.

Fixes llvm#119893.
…VALUE (llvm#119927)

Dummy arguments with the VALUE attribute do not need the implicit data
attribute.
…m#119642)

Update integer range narrowing to handle negative values.

The previous restriction to only narrowing known-non-negative values
wasn't needed, as both the signed and unsigned ranges represent bounds
on the values of each variable in the program ... except that one might
be more accurate than the other. So, if either the signed or unsigned
interpretetation of the inputs and outputs allows for integer narrowing,
the narrowing is permitted.

This commit also updates the integer optimization rewrites to preserve
the stae of constant-like operations and those that are narrowed so that
rewrites of other operations don't lose that range information.
…119759)

Currently, when dumping the contents of a GSYM there are three issues:
- Callsite information is not displayed for merged functions - this is
because of a bug in `CallSiteInfoLoader::buildFunctionMap` where when
enumerating through `Func.MergedFunctions` - we enumerate by value
instead of by reference.
- There is no variable indent for printing callsite info - meaning that
when printing callsites for merged functions, the indent will be
different than the other info of the merged function. To address this we
add configurable indent for printing callsite info
- Callsite info is printed right after merged function info. Meaning
that if the merged function also has call site information, the parent's
callsite info will appear right after the merged function's callsite
info - leading to confusion. To address this we print the callsite info
first, then the merged functions info.

This change addresses all the above 3 issues. 
Example of old vs new:

<img width="1074" alt="image"
src="https://github.com/user-attachments/assets/d039ad69-fa79-4abb-9816-eda9cc2eda53"
/>
This was preventing the containers from being pushed to the registry.
The windows container push was not tested in the pull request and had a
couple of typos that prevented it from functioning. This patch fixes
that so we can actually push the container to GHCR.
and forward it to LinkerDriver's ctor so that some uses of the global
`config` can be dropped. This is similar to how the ELF port
migrates away from the global `config`.

Pull Request: llvm#119829
…#119938)

Uppercase each word in title and toctree

_Originally posted by @nicovank in
llvm#119842 (comment).

---------

Co-authored-by: Nicolas van Kempen <nvankemp@gmail.com>
Reverts llvm#118734

There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.

This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
This patch essentially replaces:

  std::pair<const std::vector<Frame> *, unsigned>

with:

  ArrayRef<Frame>

This way, we can store and pass ArrayRef<Frame>, conceptually one
item, instead of the pointer and index.

The only problem is that we don't have an existing hash function for
ArrayRef<Frame>>, so we provide a custom one, namely
CallStackHash.
…19957)

The macho-gsym-merged-callsites-dsym is failing on some hosts.

Disabling for now while we come up with a fix.
boomanaiden154 and others added 29 commits December 16, 2024 13:22
This patch sets the default user in the linux CI container to a non-root
user, which enables properly testing a couple of features, particularly
in libcxx.
- Put the element size field in the same place for all non-pointer
  types.
- Put the element size and address space fields in the same place for
  all pointer types.
- Put the number of elements and scalable fields in the same place for
  all vector types.

This simplifies initialization and accessor methods isScalable,
getElementCount, getScalarSizeInBits and getAddressSpace.
FreeListHeap uses the _end symbol which conflicts with the _end symbol
defined by GPU start.cpp files so for now we exclude the test and the
fuzzer on GPU.
This patch adds a Github Actions workflow for Linux premerge. This
currently just calls into the existing CI scripts as a starting point.
)

Now that variables have implicit attribute, we can check for illegal use
of module host variable in device context.
…vm#118549)

This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
  void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
  void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");
```
In accordance with: ARM-software/acle#323
…vm#118624)

Summary:
This and previously extracted `CloneFunction*Into` functions will be used in later diffs.

Test Plan:
ninja check-llvm-unit check-llvm
I noticed this while working on something else, these are supposed to be
privately inherited.
…base API (llvm#115752)

This patch reimplements the locale base support for Windows flavors in a
way that is more modules-friendly and without defining non-internal
names.

Since this changes the name of some types and entry points in the built
library, this is effectively an ABI break on Windows (which is
acceptable after checking with the Windows/libc++ maintainers).
Instead of storing an auxilliary structure with the information from the
DXIL resource target extension types duplicated, access the information
that we can via the type itself.

This also means we need to handle some of the target extension types we
haven't fully defined yet, like Texture and CBuffer. For now we make an
educated guess to what those should look like based on llvm/wg-hlsl#76,
and we can update them fairly easily when we've defined them more
thoroughly.

First part of llvm#118400
This patch enables the new premerge workflow postcommit so that we can start
testing it at a reasonable scale with minimal disruption.
Creates a new toctree "Support" under which we have distinct links to arch,
platform, and compiler support.

* Moved "Platform Support" from index landing page to new doc.
* Created explicit "Architecture Support". Requested in llvm#118964 (comment).
* Moved "Compiler Support" from Status toctree to new Support toctree.

---------

Co-authored-by: Carlo Cabrera <github@carlo.cab>
This is causing mis-compiles when in SPEC2017 on AArch64 after
b3cba9b.
Update VPReductionPHIRecipe::execute to use the start value from the
start value operand of the recipe. This is needed to make sure we resume
from the correct value during epilogue vectorization.

At the moment, the start value is set to the sentinel value in
adjustRecipesForReductions, as the original start value needs to be used
when creating ResumePhi recipes.

Fixes a mis-compile introduced by b3cba9b in SPEC2017 on AArch64.
Fix issue introduced by llvm#118839.
Resolves llvm#99161

- [x]  Implement `WaveActiveAllTrue` clang builtin,
- [x]  Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAllTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114`
in `DXIL.td`
- [x] Create the `WaveActiveAllTrue.ll` and
`WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue`
lowering and map it to `int_spv_WaveActiveAllTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`
…nt and use it as linkLibs for ModuleToObject (llvm#120116)

This change allows to expose through an interface attributes wrapping
content as external resources, and the usage inside the ModuleToObject
show how we will be able to provide runtime libraries without relying on
the filesystem.
…lvm#117487)

Essentially, this makes this ill-formed:
```c++
using mat4 = _BitInt(12) [[clang::matrix_type(3, 3)]];
```

This matches preexisting behaviour for vector types (e.g.
`ext_vector_type`), and given that LLVM IR intrinsics for matrices also
take vector types, it seems like a sensible thing to do.

This is currently especially problematic since we sometimes lower matrix
types to LLVM array types instead, and while e.g. `[4 x i32]` and `<4 x
i32>` *probably* have the same similar memory layout (though I don’t
think it’s sound to rely on that either, see llvm#117486), `[4 x i12]` and
`<4 x i12>` definitely don’t.
/llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:19441:17:
 error: unused variable 'Ty' [-Werror,-Wunused-variable]
    llvm::Type *Ty = Op->getType();
                ^
1 error generated.
According to https://docs.github.com/en/rest/using-the-rest-api/github-event-types?apiVersion=2022-11-28,
When we look at the push event payload, github.event.push.head is a string
containing the SHA. This is currently causing new commits on main to cancel
the premerge pipeline of older commits.
Remove unused collection of context size information that was likely
leftover from debugging / testing.
…m#120039)

VPInstruction has a definition of mayWriteToMemory, which seems to only
be used by VPlanSLP. However VPInstructions are already handled in
VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this
definition. I think these should be the same for all intents and
purposes. The VPRecipeBase definition is more conservative but returns
true for stores/calls/invokes/SLPStores.
)

CodeGen will allocate memory for a new descriptor on descriptor loads.
CUDA Fortran local descriptor are allocated in managed memory by the
runtime. The newly allocated storage for cuda descriptor must also be
allocated through the runtime.
This follows GCC behavior of allowing a trailing immediate, that is
ignored by the assembler.
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Fix bazel build after llvm#120116
@ergawy ergawy merged commit 07c236d into ROCm:amd-trunk-dev Dec 17, 2024
49 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.