Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM] Add llvm.experimental.vector.compress intrinsic #92289

Merged
merged 56 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
3a7b064
Add initial code for @llvm.masked.compress intrinsics
lawben May 15, 2024
75abf0b
Remove requirements for legal types
lawben May 15, 2024
0329bc9
Add tests for AArch64
lawben May 15, 2024
73bfebb
Add floating point test
lawben May 15, 2024
e4423a1
Add documentation
lawben May 15, 2024
3e99678
Fix formatting
lawben May 15, 2024
b686f83
Fix references in docs
lawben May 16, 2024
73cc28f
Add widen for vector type legalization
lawben May 16, 2024
8a613f3
Put expand logic in TargerLowering to avoid code duplication.
lawben May 16, 2024
a4df959
Fix formatting
lawben May 16, 2024
17004b9
Add basic lowering of MCOMPRESS in GlobalISel
lawben May 17, 2024
984cad1
Add basic AArch64 MIR test
lawben May 17, 2024
0ea2415
Address PR comments
lawben May 17, 2024
1dc79b4
Update docs according to PR comments
lawben May 17, 2024
c8515ca
Match result and input types of MCOMPRES
lawben May 21, 2024
8353b2d
Add constant folding to SelectionDAG::getNode()
lawben May 21, 2024
a9aba29
Address PR comments for type legalization
lawben May 21, 2024
b0af320
Move masked.compress in docs
lawben May 21, 2024
d9587c7
Fix bug for x86 result widening
lawben May 21, 2024
c04da9b
Remove zero-fill when widening vector
lawben May 21, 2024
a60523c
Use [[maybe_unused]] for asserts
lawben May 22, 2024
9279e5e
Move constant folding to DAGCombiner
lawben May 24, 2024
b48dada
Change TODO for AArch64 GlobalISel
lawben May 24, 2024
d443671
Remove wrong ISA from docs
lawben May 24, 2024
d45f61b
Rename MCOMPRESS to MASKED_COMPRESS
lawben Jun 7, 2024
4366a43
Get stack alignment for load in GlobalISel
lawben Jun 7, 2024
808709f
Fix formatting
lawben Jun 7, 2024
d5fca0f
Replace use of TLI inside of TargetLowering
lawben Jun 7, 2024
00b64d2
Use vector alignement for stack load
lawben Jun 7, 2024
60f9c61
Add llvm.masked.compress to release notes
lawben Jun 7, 2024
3faed13
Merge branch 'main' into masked_compress
lawben Jun 7, 2024
6aa480c
Merge branch 'main' into masked_compress
lawben Jun 7, 2024
6549386
Merge branch 'main' into masked_compress
lawben Jun 12, 2024
5589519
Merge branch 'main' into masked_compress
lawben Jun 17, 2024
f7e4b48
Add passthru vector to @llvm.masked.compress
lawben Jun 17, 2024
9f291c8
Add passthru to DAGCombiner
lawben Jun 17, 2024
c733d6b
Update LangRef with passthru
lawben Jun 17, 2024
9222b54
Add passthru to GlobalISel
lawben Jun 17, 2024
616c142
Merge branch 'main' into masked_compress
lawben Jun 17, 2024
daa784f
Fix formatting
lawben Jun 17, 2024
3e4673c
Fix GlobalISel test
lawben Jun 17, 2024
4b12b32
Update comment on MASKED_COMPRESS opcode
lawben Jun 17, 2024
6c9c969
Fix docs
lawben Jun 17, 2024
069eb24
Address PR comments
lawben Jun 18, 2024
457cbdf
Merge branch 'main' into masked_compress
lawben Jun 19, 2024
5e3189b
Use isConstTrueVal
lawben Jun 19, 2024
6b3a3b8
Remove redundant undef
lawben Jun 19, 2024
50cce23
Address PR comments
lawben Jun 21, 2024
89c3e9f
Return passthru for undef mask or vec
lawben Jun 21, 2024
995c863
Address PR comments
lawben Jun 25, 2024
adb9d9c
Rename masked.compress to experimental.vector.compress
lawben Jul 1, 2024
ba2939e
Add freeze to mask extract for poison/undef entries
lawben Jul 1, 2024
11e1742
Fix docs
lawben Jul 1, 2024
32cc27f
Fix docs
lawben Jul 2, 2024
99610a8
Address PR comments
lawben Jul 3, 2024
efa2e92
Merge branch 'main' into masked_compress
lawben Jul 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions llvm/docs/GlobalISel/GenericOpcode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -710,6 +710,13 @@ The type of the operand must be equal to or larger than the vector element
type. If the operand is larger than the vector element type, the scalar is
implicitly truncated to the vector element type.

G_VECTOR_COMPRESS
^^^^^^^^^^^^^^^^^

Given an input vector, a mask vector, and a passthru vector, continuously place
all selected (i.e., where mask[i] = true) input lanes in an output vector. All
remaining lanes in the output are taken from passthru, which may be undef.

Vector Reduction Operations
---------------------------

Expand Down
87 changes: 87 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19234,6 +19234,93 @@ the follow sequence of operations:

The ``mask`` operand will apply to at least the gather and scatter operations.


.. _int_vector_compress:

'``llvm.experimental.vector.compress.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

LLVM provides an intrinsic for compressing data within a vector based on a selection mask.
Semantically, this is similar to :ref:`llvm.masked.compressstore <int_compressstore>` but with weaker assumptions
and without storing the results to memory, i.e., the data remains in the vector.

Syntax:
"""""""
This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collected
from an input vector and placed adjacently within the result vector. A mask defines which elements to collect from the vector.
The remaining lanes are filled with values from ``passthru``.

:: code-block:: llvm

declare <8 x i32> @llvm.experimental.vector.compress.v8i32(<8 x i32> <value>, <8 x i1> <mask>, <8 x i32> <passthru>)
declare <16 x float> @llvm.experimental.vector.compress.v16f32(<16 x float> <value>, <16 x i1> <mask>, <16 x float> undef)

Overview:
"""""""""

Selects elements from input vector ``value`` according to the ``mask``.
All selected elements are written into adjacent lanes in the result vector,
from lower to higher.
The mask holds an entry for each vector lane, and is used to select elements
to be kept.
If a ``passthru`` vector is given, all remaining lanes are filled with the
corresponding lane's value from ``passthru``.
The main difference to :ref:`llvm.masked.compressstore <int_compressstore>` is
that the we do not need to guard against memory access for unselected lanes.
This allows for branchless code and better optimization for all targets that
do not support or have inefficient
instructions of the explicit semantics of
:ref:`llvm.masked.compressstore <int_compressstore>` but still have some form
of compress operations.
The result vector can be written with a similar effect, as all the selected
values are at the lower positions of the vector, but without requiring
branches to avoid writes where the mask is ``false``.

Arguments:
""""""""""

The first operand is the input vector, from which elements are selected.
The second operand is the mask, a vector of boolean values.
The third operand is the passthru vector, from which elements are filled
into remaining lanes.
The mask and the input vector must have the same number of vector elements.
The input and passthru vectors must have the same type.

Semantics:
""""""""""

The ``llvm.experimental.vector.compress`` intrinsic compresses data within a vector.
It collects elements from possibly non-adjacent lanes of a vector and places
them contiguously in the result vector based on a selection mask, filling the
remaining lanes with values from ``passthru``.
This intrinsic performs the logic of the following C++ example.
All values in ``out`` after the last selected one are undefined if
``passthru`` is undefined.
If all entries in the ``mask`` are 0, the ``out`` vector is ``passthru``.
If any element of the mask is poison, all elements of the result are poison.
Otherwise, if any element of the mask is undef, all elements of the result are undef.
If ``passthru`` is undefined, the number of valid lanes is equal to the number
of ``true`` entries in the mask, i.e., all lanes >= number-of-selected-values
are undefined.

.. code-block:: cpp

// Consecutively place selected values in a vector.
using VecT __attribute__((vector_size(N))) = int;
VecT compress(VecT vec, VecT mask, VecT passthru) {
VecT out;
int idx = 0;
for (int i = 0; i < N / sizeof(int); ++i) {
out[idx] = vec[i];
idx += static_cast<bool>(mask[i]);
}
for (; idx < N / sizeof(int); ++idx) {
out[idx] = passthru[idx];
}
return out;
}


Matrix Intrinsics
-----------------

Expand Down
1 change: 1 addition & 0 deletions llvm/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ Changes to the LLVM IR
* ``llvm.instprof.mcdc.tvbitmap.update``: 3rd argument has been
removed. The next argument has been changed from byte index to bit
index.
* Added ``llvm.experimental.vector.compress`` intrinsic.

Changes to LLVM infrastructure
------------------------------
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,7 @@ class LegalizerHelper {
LegalizeResult lowerUnmergeValues(MachineInstr &MI);
LegalizeResult lowerExtractInsertVectorElt(MachineInstr &MI);
LegalizeResult lowerShuffleVector(MachineInstr &MI);
LegalizeResult lowerVECTOR_COMPRESS(MachineInstr &MI);
Register getDynStackAllocTargetPtr(Register SPReg, Register AllocSize,
Align Alignment, LLT PtrTy);
LegalizeResult lowerDynStackAlloc(MachineInstr &MI);
Expand Down
8 changes: 8 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,14 @@ enum NodeType {
/// non-constant operands.
STEP_VECTOR,

/// VECTOR_COMPRESS(Vec, Mask, Passthru)
/// consecutively place vector elements based on mask
/// e.g., vec = {A, B, C, D} and mask = {1, 0, 1, 0}
/// --> {A, C, ?, ?} where ? is undefined
/// If passthru is defined, ?s are replaced with elements from passthru.
/// If passthru is undef, ?s remain undefined.
VECTOR_COMPRESS,

/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,
/// producing an unsigned/signed value of type i[2*N], then return the top
/// part.
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -5495,6 +5495,10 @@ class TargetLowering : public TargetLoweringBase {
/// method accepts vectors as its arguments.
SDValue expandVectorSplice(SDNode *Node, SelectionDAG &DAG) const;

/// Expand a vector VECTOR_COMPRESS into a sequence of extract element, store
/// temporarily, advance store position, before re-loading the final vector.
SDValue expandVECTOR_COMPRESS(SDNode *Node, SelectionDAG &DAG) const;

/// Legalize a SETCC or VP_SETCC with given LHS and RHS and condition code CC
/// on the current target. A VP_SETCC will additionally be given a Mask
/// and/or EVL not equal to SDValue().
Expand Down
5 changes: 5 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2362,6 +2362,11 @@ def int_masked_compressstore:
[IntrWriteMem, IntrArgMemOnly, IntrWillReturn,
NoCapture<ArgIndex<1>>]>;

def int_experimental_vector_compress:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
[IntrNoMem, IntrWillReturn]>;

// Test whether a pointer is associated with a type metadata identifier.
def int_type_test : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
[IntrNoMem, IntrWillReturn, IntrSpeculatable]>;
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/Support/TargetOpcodes.def
Original file line number Diff line number Diff line change
Expand Up @@ -751,6 +751,9 @@ HANDLE_TARGET_OPCODE(G_SHUFFLE_VECTOR)
/// Generic splatvector.
HANDLE_TARGET_OPCODE(G_SPLAT_VECTOR)

/// Generic masked compress.
HANDLE_TARGET_OPCODE(G_VECTOR_COMPRESS)

/// Generic count trailing zeroes.
HANDLE_TARGET_OPCODE(G_CTTZ)

Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/Target/GenericOpcodes.td
Original file line number Diff line number Diff line change
Expand Up @@ -1500,6 +1500,13 @@ def G_SPLAT_VECTOR: GenericInstruction {
let hasSideEffects = false;
}

// Generic masked compress.
def G_VECTOR_COMPRESS: GenericInstruction {
let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type0:$vec, type1:$mask, type0:$passthru);
let hasSideEffects = false;
}

//------------------------------------------------------------------------------
// Vector reductions
//------------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ def : GINodeEquiv<G_VECREDUCE_UMAX, vecreduce_umax>;
def : GINodeEquiv<G_VECREDUCE_SMIN, vecreduce_smin>;
def : GINodeEquiv<G_VECREDUCE_SMAX, vecreduce_smax>;
def : GINodeEquiv<G_VECREDUCE_ADD, vecreduce_add>;
def : GINodeEquiv<G_VECTOR_COMPRESS, vector_compress>;

def : GINodeEquiv<G_STRICT_FADD, strict_fadd>;
def : GINodeEquiv<G_STRICT_FSUB, strict_fsub>;
Expand Down
8 changes: 8 additions & 0 deletions llvm/include/llvm/Target/TargetSelectionDAG.td
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,12 @@ def SDTMaskedScatter : SDTypeProfile<0, 4, [
SDTCisSameNumEltsAs<0, 1>, SDTCisSameNumEltsAs<0, 3>
]>;

def SDTMaskedCompress : SDTypeProfile<1, 3, [
lawben marked this conversation as resolved.
Show resolved Hide resolved
SDTCisVec<0>, SDTCisSameAs<0, 1>,
SDTCisVec<2>, SDTCisSameNumEltsAs<1, 2>,
SDTCisSameAs<1, 3>
]>;

def SDTVecShuffle : SDTypeProfile<1, 2, [
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>
]>;
Expand Down Expand Up @@ -739,6 +745,8 @@ def masked_gather : SDNode<"ISD::MGATHER", SDTMaskedGather,
def masked_scatter : SDNode<"ISD::MSCATTER", SDTMaskedScatter,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

def vector_compress : SDNode<"ISD::VECTOR_COMPRESS", SDTMaskedCompress>;

// Do not use ld, st directly. Use load, extload, sextload, zextload, store,
// and truncst (see below).
def ld : SDNode<"ISD::LOAD" , SDTLoad,
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1982,6 +1982,8 @@ unsigned IRTranslator::getSimpleIntrinsicOpcode(Intrinsic::ID ID) {
return TargetOpcode::G_VECREDUCE_UMAX;
case Intrinsic::vector_reduce_umin:
return TargetOpcode::G_VECREDUCE_UMIN;
case Intrinsic::experimental_vector_compress:
return TargetOpcode::G_VECTOR_COMPRESS;
case Intrinsic::lround:
return TargetOpcode::G_LROUND;
case Intrinsic::llround:
Expand Down
89 changes: 89 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3953,6 +3953,8 @@ LegalizerHelper::lower(MachineInstr &MI, unsigned TypeIdx, LLT LowerHintTy) {
return lowerExtractInsertVectorElt(MI);
case G_SHUFFLE_VECTOR:
return lowerShuffleVector(MI);
case G_VECTOR_COMPRESS:
return lowerVECTOR_COMPRESS(MI);
case G_DYN_STACKALLOC:
return lowerDynStackAlloc(MI);
case G_STACKSAVE:
Expand Down Expand Up @@ -7510,6 +7512,93 @@ LegalizerHelper::lowerShuffleVector(MachineInstr &MI) {
return Legalized;
}

LegalizerHelper::LegalizeResult
LegalizerHelper::lowerVECTOR_COMPRESS(llvm::MachineInstr &MI) {
auto [Dst, DstTy, Vec, VecTy, Mask, MaskTy, Passthru, PassthruTy] =
MI.getFirst4RegLLTs();

if (VecTy.isScalableVector())
report_fatal_error("Cannot expand masked_compress for scalable vectors.");

Align VecAlign = getStackTemporaryAlignment(VecTy);
MachinePointerInfo PtrInfo;
Register StackPtr =
createStackTemporary(TypeSize::getFixed(VecTy.getSizeInBytes()), VecAlign,
PtrInfo)
.getReg(0);
MachinePointerInfo ValPtrInfo =
MachinePointerInfo::getUnknownStack(*MI.getMF());

LLT IdxTy = LLT::scalar(32);
LLT ValTy = VecTy.getElementType();
Align ValAlign = getStackTemporaryAlignment(ValTy);

auto OutPos = MIRBuilder.buildConstant(IdxTy, 0);

bool HasPassthru =
MRI.getVRegDef(Passthru)->getOpcode() != TargetOpcode::G_IMPLICIT_DEF;

if (HasPassthru)
MIRBuilder.buildStore(Passthru, StackPtr, PtrInfo, VecAlign);

Register LastWriteVal;
std::optional<APInt> PassthruSplatVal =
isConstantOrConstantSplatVector(*MRI.getVRegDef(Passthru), MRI);

if (PassthruSplatVal.has_value()) {
LastWriteVal =
MIRBuilder.buildConstant(ValTy, PassthruSplatVal.value()).getReg(0);
} else if (HasPassthru) {
auto Popcount = MIRBuilder.buildZExt(MaskTy.changeElementSize(32), Mask);
Popcount = MIRBuilder.buildInstr(TargetOpcode::G_VECREDUCE_ADD,
{LLT::scalar(32)}, {Popcount});

Register LastElmtPtr =
getVectorElementPointer(StackPtr, VecTy, Popcount.getReg(0));
LastWriteVal =
MIRBuilder.buildLoad(ValTy, LastElmtPtr, ValPtrInfo, ValAlign)
.getReg(0);
}

unsigned NumElmts = VecTy.getNumElements();
for (unsigned I = 0; I < NumElmts; ++I) {
auto Idx = MIRBuilder.buildConstant(IdxTy, I);
auto Val = MIRBuilder.buildExtractVectorElement(ValTy, Vec, Idx);
Register ElmtPtr =
getVectorElementPointer(StackPtr, VecTy, OutPos.getReg(0));
MIRBuilder.buildStore(Val, ElmtPtr, ValPtrInfo, ValAlign);

LLT MaskITy = MaskTy.getElementType();
auto MaskI = MIRBuilder.buildExtractVectorElement(MaskITy, Mask, Idx);
if (MaskITy.getSizeInBits() > 1)
MaskI = MIRBuilder.buildTrunc(LLT::scalar(1), MaskI);

MaskI = MIRBuilder.buildZExt(IdxTy, MaskI);
OutPos = MIRBuilder.buildAdd(IdxTy, OutPos, MaskI);

if (HasPassthru && I == NumElmts - 1) {
auto EndOfVector =
MIRBuilder.buildConstant(IdxTy, VecTy.getNumElements() - 1);
auto AllLanesSelected = MIRBuilder.buildICmp(
CmpInst::ICMP_UGT, LLT::scalar(1), OutPos, EndOfVector);
OutPos = MIRBuilder.buildInstr(TargetOpcode::G_UMIN, {IdxTy},
{OutPos, EndOfVector});
ElmtPtr = getVectorElementPointer(StackPtr, VecTy, OutPos.getReg(0));

LastWriteVal =
MIRBuilder.buildSelect(ValTy, AllLanesSelected, Val, LastWriteVal)
.getReg(0);
MIRBuilder.buildStore(LastWriteVal, ElmtPtr, ValPtrInfo, ValAlign);
}
}

// TODO: Use StackPtr's FrameIndex alignment.
MIRBuilder.buildLoad(Dst, StackPtr, PtrInfo, VecAlign);

MI.eraseFromParent();
return Legalized;
}

Register LegalizerHelper::getDynStackAllocTargetPtr(Register SPReg,
Register AllocSize,
Align Alignment,
Expand Down
Loading