Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Open
wants to merge 80 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
1820567
Ruihan: POC with REX2
Ruihan-Yin Mar 25, 2024
d1afc68
resolve comments
Ruihan-Yin May 17, 2024
2335aa3
refactor register encoding for REX2
Ruihan-Yin May 20, 2024
6578c58
merge REX2 path to legacy path
Ruihan-Yin May 21, 2024
01eeb80
Enable REX2 in more instructions.
Ruihan-Yin May 30, 2024
690aee3
Avoid repeatedly estimate the size of REX2 prefix
Ruihan-Yin Jun 3, 2024
31d7fb4
Enable REX2 encoding on RI and SV path
Ruihan-Yin Jun 5, 2024
a995878
Add rex2 support to rotate and shift.
Ruihan-Yin Jun 6, 2024
74aacf6
CR session.
Ruihan-Yin Jun 7, 2024
c330927
Testing infra updates: assert REX2 is enabled.
Ruihan-Yin Jun 11, 2024
fbf20d1
revert rcl_N and rcr_N, tp and latency data for these instructions is…
Ruihan-Yin Jun 11, 2024
ea02e70
partially enable REX2 on emitOutputAM, case covered: R_AR and AR_R.
Ruihan-Yin Jun 12, 2024
c74b801
Adding unit tests.
Ruihan-Yin Jun 13, 2024
34980b4
push, pop, inc, dec, neg, not, xadd, shld, shrd, cmpxchg, setcc, bswap.
Ruihan-Yin Jun 26, 2024
2ffdbeb
bug fix for bswap
Ruihan-Yin Jun 27, 2024
3a729bb
bt
Ruihan-Yin Jun 28, 2024
d943b03
xchg, idiv
Ruihan-Yin Jul 1, 2024
c8fee9c
Make sure add REX2 prefix if register encoding for EGPRs are being ca…
Ruihan-Yin Jul 2, 2024
6ec0e97
Ensure code size is correctly computed in R_R_I path.
Ruihan-Yin Jul 8, 2024
1d01003
clean up
Ruihan-Yin Jul 9, 2024
1acc219
Change all AddSimdPrefix to AddX86Prefix
Ruihan-Yin Jul 15, 2024
87ad443
div, mulEAX
Ruihan-Yin Jul 16, 2024
bb9905a
filter out test from REX2 encoding when using ACC form.
Ruihan-Yin Jul 19, 2024
86083b2
Make sure REX prefix will not be added when emitting with REX2.
Ruihan-Yin Jul 24, 2024
dfe8760
resolve comments.
Ruihan-Yin Aug 5, 2024
64761cd
make sure the APX debug knob is only available under debug build.
Ruihan-Yin Oct 24, 2024
f1aba62
clean up some out-dated code.
Ruihan-Yin Nov 12, 2024
f5cc5a8
enable movsxd
Ruihan-Yin Nov 12, 2024
7ca8433
Enable "Call"
Ruihan-Yin Nov 13, 2024
bc4d225
Enable "JMP"
Ruihan-Yin Nov 15, 2024
deb3814
resolve merge errors
Ruihan-Yin Nov 18, 2024
0d63230
formatting
Ruihan-Yin Nov 18, 2024
13b8076
remote coredistools.dll for internal tests only
Ruihan-Yin Nov 18, 2024
42c6cfc
bug fix
Ruihan-Yin Nov 19, 2024
b1a9617
SUB reg, reg, reg
Ruihan-Yin Aug 8, 2024
ec5d5ca
enable NDD on genCodeForBinary
Ruihan-Yin Aug 28, 2024
ebeaf04
consolidate TakesLegacyPromotedEvexPrefix logics.
Ruihan-Yin Aug 30, 2024
547f01d
ensure register encoding is correct under legacy-promoted-evex encoding.
Ruihan-Yin Aug 30, 2024
3566464
Make sure the overflow check is correctly emitted.
Ruihan-Yin Sep 4, 2024
f8e9c4d
simplify the compiler setup logics.
Ruihan-Yin Sep 4, 2024
6bfd050
emitInsNddBinary
Ruihan-Yin Sep 6, 2024
4b0085d
make sure REX will not be added when EVEX presents.
Ruihan-Yin Sep 7, 2024
5701b1c
resolve comment and clean up.
Ruihan-Yin Sep 11, 2024
6d30388
enable more NDD instructions.
Ruihan-Yin Sep 13, 2024
5d3768c
bug fixes
Ruihan-Yin Sep 13, 2024
a5619e4
enable imul
Ruihan-Yin Sep 13, 2024
c71ace6
add emitter unit tests, and fix encoding error for CMOVcc
Ruihan-Yin Sep 16, 2024
ca92da9
bug fixes:
Ruihan-Yin Sep 18, 2024
5d10aef
refactor emitInsBinary
Ruihan-Yin Sep 19, 2024
5f288a6
clean up
Ruihan-Yin Sep 19, 2024
f4e96b0
clean up and refactor some code
Ruihan-Yin Sep 20, 2024
637c413
make sure the code size estimation is correct for some apx promoted i…
Ruihan-Yin Sep 25, 2024
a203a4d
add tuning knob to EVEX.ND feature.
Ruihan-Yin Sep 30, 2024
a99705a
flip the Evex.nd knob.
Ruihan-Yin Oct 1, 2024
b5fa5bf
put NDD control knob to the correct place.
Ruihan-Yin Oct 3, 2024
b69d01e
resolve merge errors
Ruihan-Yin Nov 20, 2024
52539c3
Make sure APX related knobs are defined properly across platforms
Ruihan-Yin Nov 20, 2024
25d66bf
Add Evex.nf to instrDesc
Ruihan-Yin Oct 2, 2024
a19da9e
{nf} add reg, reg
Ruihan-Yin Oct 8, 2024
2e8d714
Enable EVEX.NF in more instructions
Ruihan-Yin Oct 9, 2024
df59342
more instructions
Ruihan-Yin Oct 10, 2024
226fabb
comments.
Ruihan-Yin Oct 10, 2024
36c6631
lzcnt, tzcnt, popcnt
Ruihan-Yin Oct 10, 2024
5f8a01d
Exclude ACC form from EVEX promotion.
Ruihan-Yin Oct 15, 2024
0453630
BMI instructions.
Ruihan-Yin Oct 15, 2024
07868bc
bug fixes
Ruihan-Yin Oct 16, 2024
69f7e8b
Tweak the code size calculation to make sure REX2 and APX-EVEX are pr…
Ruihan-Yin Oct 18, 2024
1c1a894
bug fixes for stress mode
Ruihan-Yin Oct 29, 2024
1be4b12
Add idEvexNoPromotion to emitter to exclude the APX-EVEX promotion fr…
Ruihan-Yin Nov 4, 2024
bfb06c7
resolve merge error
Ruihan-Yin Nov 20, 2024
9541a99
fix merge error
Ruihan-Yin Nov 21, 2024
543d949
Revert "Add idEvexNoPromotion to emitter to exclude the APX-EVEX prom…
Ruihan-Yin Nov 21, 2024
a879019
bug fix
Ruihan-Yin Nov 22, 2024
55cbda6
introduce _no_evex suffix for some instructions for cases when LOCK w…
Ruihan-Yin Nov 22, 2024
a9a3d5c
Merge remote-tracking branch 'origin/main' into apx-evex-nf-nov
Ruihan-Yin Dec 17, 2024
0eef560
resolve merge comflict
Ruihan-Yin Dec 17, 2024
0480c02
fix merge error.
Ruihan-Yin Dec 17, 2024
48cec5f
fix comments and some checks.
Ruihan-Yin Dec 19, 2024
7171e0e
formatting
Ruihan-Yin Dec 19, 2024
5f7606c
remove unneeded env var.
Ruihan-Yin Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 194 additions & 18 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -433,12 +433,13 @@ void CodeGen::instGen_Set_Reg_To_Imm(emitAttr size,
else
{
// For section constant, the immediate will be relocatable
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm DEBUGARG(targetHandle) DEBUGARG(gtFlags));
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm,
INS_OPTS_NONE DEBUGARG(targetHandle) DEBUGARG(gtFlags));
}
}
else
{
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm DEBUGARG(targetHandle) DEBUGARG(gtFlags));
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm, INS_OPTS_NONE DEBUGARG(targetHandle) DEBUGARG(gtFlags));
}
}
regSet.verifyRegUsed(reg);
Expand Down Expand Up @@ -769,12 +770,20 @@ void CodeGen::genCodeForNegNot(GenTree* tree)
{
GenTree* operand = tree->gtGetOp1();
assert(operand->isUsedFromReg());
regNumber operandReg = genConsumeReg(operand);
regNumber operandReg = genConsumeReg(operand);
instruction ins = genGetInsForOper(tree->OperGet(), targetType);

inst_Mov(targetType, targetReg, operandReg, /* canSkip */ true);
if (JitConfig.JitEnableApxNDD() && GetEmitter()->IsApxNDDEncodableInstruction(ins) && (targetReg != operandReg))
{
GetEmitter()->emitIns_R_R(ins, emitTypeSize(operand), targetReg, operandReg, INS_OPTS_EVEX_nd);
}
else
{
inst_Mov(targetType, targetReg, operandReg, /* canSkip */ true);

instruction ins = genGetInsForOper(tree->OperGet(), targetType);
inst_RV(ins, targetReg, targetType);
instruction ins = genGetInsForOper(tree->OperGet(), targetType);
inst_RV(ins, targetReg, targetType);
}
}

genProduceReg(tree);
Expand Down Expand Up @@ -1189,12 +1198,49 @@ void CodeGen::genCodeForBinary(GenTreeOp* treeNode)
// reg3 = reg3 op reg2
else
{
var_types op1Type = op1->TypeGet();
inst_Mov(op1Type, targetReg, op1reg, /* canSkip */ false);
regSet.verifyRegUsed(targetReg);
gcInfo.gcMarkRegPtrVal(targetReg, op1Type);
dst = treeNode;
src = op2;
if (JitConfig.JitEnableApxNDD() && emit->IsApxNDDEncodableInstruction(ins) && !varTypeIsFloating(treeNode))
{
// TODO-xarch-apx:
// APX can provide optimal code gen in this case using NDD feature:
// reg3 = op1 op op2 without extra mov

// see if it can be optimized by inc/dec
if (oper == GT_ADD && op2->isContainedIntOrIImmed() && !treeNode->gtOverflowEx())
{
if (op2->IsIntegralConst(1))
{
emit->emitIns_R_R(INS_inc, emitTypeSize(treeNode), targetReg, op1reg, INS_OPTS_EVEX_nd);
genProduceReg(treeNode);
return;
}
else if (op2->IsIntegralConst(-1))
{
emit->emitIns_R_R(INS_dec, emitTypeSize(treeNode), targetReg, op1reg, INS_OPTS_EVEX_nd);
genProduceReg(treeNode);
return;
}
}

assert(op1reg != targetReg);
assert(op2reg != targetReg);
emit->emitInsBinary(ins, emitTypeSize(treeNode), op1, op2, targetReg);
if (treeNode->gtOverflowEx())
{
assert(oper == GT_ADD || oper == GT_SUB);
genCheckOverflow(treeNode);
}
genProduceReg(treeNode);
return;
}
else
{
var_types op1Type = op1->TypeGet();
inst_Mov(op1Type, targetReg, op1reg, /* canSkip */ false);
regSet.verifyRegUsed(targetReg);
gcInfo.gcMarkRegPtrVal(targetReg, op1Type);
dst = treeNode;
src = op2;
}
}

// try to use an inc or dec
Expand All @@ -1213,6 +1259,7 @@ void CodeGen::genCodeForBinary(GenTreeOp* treeNode)
return;
}
}

regNumber r = emit->emitInsBinary(ins, emitTypeSize(treeNode), dst, src);
noway_assert(r == targetReg);

Expand Down Expand Up @@ -1326,6 +1373,25 @@ void CodeGen::genCodeForMul(GenTreeOp* treeNode)
}
assert(regOp->isUsedFromReg());

if (JitConfig.JitEnableApxNDD() && emit->IsApxNDDEncodableInstruction(ins) &&
regOp->GetRegNum() != mulTargetReg)
{
// use NDD form to optimize this form:
// mov targetReg, regOp
// imul targetReg, rmOp
// to imul targetReg, regOp rmOp.
emit->emitInsBinary(ins, size, regOp, rmOp, mulTargetReg);
if (requiresOverflowCheck)
{
// Overflow checking is only used for non-floating point types
noway_assert(!varTypeIsFloating(treeNode));

genCheckOverflow(treeNode);
}
genProduceReg(treeNode);
return;
}

// Setup targetReg when neither of the source operands was a matching register
inst_Mov(targetType, mulTargetReg, regOp->GetRegNum(), /* canSkip */ true);

Expand Down Expand Up @@ -4438,23 +4504,23 @@ void CodeGen::genCodeForLockAdd(GenTreeOp* node)
if (imm == 1)
{
// inc [addr]
GetEmitter()->emitIns_AR(INS_inc, size, addr->GetRegNum(), 0);
GetEmitter()->emitIns_AR(INS_inc_no_evex, size, addr->GetRegNum(), 0);
}
else if (imm == -1)
{
// dec [addr]
GetEmitter()->emitIns_AR(INS_dec, size, addr->GetRegNum(), 0);
GetEmitter()->emitIns_AR(INS_dec_no_evex, size, addr->GetRegNum(), 0);
}
else
{
// add [addr], imm
GetEmitter()->emitIns_I_AR(INS_add, size, imm, addr->GetRegNum(), 0);
GetEmitter()->emitIns_I_AR(INS_add_no_evex, size, imm, addr->GetRegNum(), 0);
}
}
else
{
// add [addr], data
GetEmitter()->emitIns_AR_R(INS_add, size, data->GetRegNum(), addr->GetRegNum(), 0);
GetEmitter()->emitIns_AR_R(INS_add_no_evex, size, data->GetRegNum(), addr->GetRegNum(), 0);
}
}

Expand All @@ -4481,7 +4547,7 @@ void CodeGen::genLockedInstructions(GenTreeOp* node)

if (node->OperIs(GT_XORR, GT_XAND))
{
const instruction ins = node->OperIs(GT_XORR) ? INS_or : INS_and;
const instruction ins = node->OperIs(GT_XORR) ? INS_or_no_evex : INS_and_no_evex;

if (node->IsUnusedValue())
{
Expand Down Expand Up @@ -4873,6 +4939,25 @@ void CodeGen::genCodeForShift(GenTree* tree)
genProduceReg(tree);
return;
}

if (JitConfig.JitEnableApxNDD() && GetEmitter()->IsApxNDDEncodableInstruction(ins) &&
(tree->GetRegNum() != operandReg))
{
ins = genMapShiftInsToShiftByConstantIns(ins, shiftByValue);
// If APX is available, we can use NDD to optimize the case when LSRA failed to avoid explicit mov.
// this case might be rarely hit.
if (shiftByValue == 1)
{
GetEmitter()->emitIns_R_R(ins, emitTypeSize(tree), tree->GetRegNum(), operandReg, INS_OPTS_EVEX_nd);
}
else
{
GetEmitter()->emitIns_R_R_I(ins, emitTypeSize(tree), tree->GetRegNum(), operandReg, shiftByValue,
INS_OPTS_EVEX_nd);
}
genProduceReg(tree);
return;
}
#endif
// First, move the operand to the destination register and
// later on perform the shift in-place.
Expand Down Expand Up @@ -4919,6 +5004,16 @@ void CodeGen::genCodeForShift(GenTree* tree)
// The operand to be shifted must not be in ECX
noway_assert(operandReg != REG_RCX);

if (JitConfig.JitEnableApxNDD() && GetEmitter()->IsApxNDDEncodableInstruction(ins) &&
(tree->GetRegNum() != operandReg))
{
// If APX is available, we can use NDD to optimize the case when LSRA failed to avoid explicit mov.
// this case might be rarely hit.
GetEmitter()->emitIns_R_R(ins, emitTypeSize(tree), tree->GetRegNum(), operandReg, INS_OPTS_EVEX_nd);
genProduceReg(tree);
return;
}

inst_Mov(targetType, tree->GetRegNum(), operandReg, /* canSkip */ true);
inst_RV(ins, tree->GetRegNum(), targetType);
}
Expand Down Expand Up @@ -9270,6 +9365,87 @@ void CodeGen::genAmd64EmitterUnitTestsApx()

theEmitter->emitIns_S(INS_neg, EA_2BYTE, 0, 0);
theEmitter->emitIns_S(INS_not, EA_2BYTE, 0, 0);

// APX-EVEX

theEmitter->emitIns_R_R_R(INS_add, EA_8BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_sub, EA_2BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_or, EA_2BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_and, EA_2BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_xor, EA_1BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R_I(INS_or, EA_2BYTE, REG_R10, REG_EAX, 10565, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_or, EA_8BYTE, REG_R10, REG_EAX, 10, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_S(INS_or, EA_8BYTE, REG_R10, REG_EAX, 0, 1, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_neg, EA_2BYTE, REG_R10, REG_ECX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_shl, EA_2BYTE, REG_R11, REG_EAX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R(INS_shl_1, EA_2BYTE, REG_R11, REG_EAX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_shl_N, EA_2BYTE, REG_R11, REG_ECX, 7, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_shl_N, EA_2BYTE, REG_R11, REG_ECX, 7, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_inc, EA_2BYTE, REG_R11, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R(INS_dec, EA_2BYTE, REG_R11, REG_ECX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R_R(INS_cmovo, EA_4BYTE, REG_R12, REG_R11, REG_EAX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R_R(INS_imul, EA_4BYTE, REG_R12, REG_R11, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_S(INS_imul, EA_4BYTE, REG_R12, REG_R11, 0, 1, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_add, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_sub, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_and, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_or, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_xor, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_inc, EA_4BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_dec, EA_4BYTE, REG_R12, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_I(INS_add, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_sub, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_and, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_or, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_xor, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_S(INS_add, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_sub, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_and, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_or, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_xor, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R(INS_neg, EA_2BYTE, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_shl, EA_2BYTE, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_shl_1, EA_2BYTE, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_shl_N, EA_2BYTE, REG_R11, 7, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_shl_N, EA_2BYTE, REG_R11, 7, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R(INS_imul, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_imul, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_I(INS_imul_15, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R(INS_imulEAX, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_mulEAX, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_div, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_idiv, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R(INS_tzcnt_evex, EA_8BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_lzcnt_evex, EA_8BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_popcnt_evex, EA_8BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_S(INS_tzcnt_evex, EA_8BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_lzcnt_evex, EA_8BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_popcnt_evex, EA_8BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R_R(INS_add, EA_2BYTE, REG_R12, REG_R13, REG_R11,
(insOpts)(INS_OPTS_EVEX_nf | INS_OPTS_EVEX_nd));

theEmitter->emitIns_R_R_R(INS_andn, EA_8BYTE, REG_R11, REG_R13, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R_R(INS_bextr, EA_8BYTE, REG_R11, REG_R13, REG_R11, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R(INS_blsi, EA_8BYTE, REG_R11, REG_R13, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_blsmsk, EA_8BYTE, REG_R11, REG_R13, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_blsr, EA_8BYTE, REG_R11, 0, 1);
}

#endif // defined(DEBUG) && defined(TARGET_AMD64)
Expand Down Expand Up @@ -11314,7 +11490,7 @@ void CodeGen::instGen_MemoryBarrier(BarrierKind barrierKind)
if (barrierKind == BARRIER_FULL)
{
instGen(INS_lock);
GetEmitter()->emitIns_I_AR(INS_or, EA_4BYTE, 0, REG_SPBASE, 0);
GetEmitter()->emitIns_I_AR(INS_or_no_evex, EA_4BYTE, 0, REG_SPBASE, 0);
}
}

Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2299,6 +2299,7 @@ void Compiler::compSetProcessor()
if (canUseApxEncoding())
{
codeGen->GetEmitter()->SetUseRex2Encoding(true);
codeGen->GetEmitter()->SetUsePromotedEVEXEncoding(true);
}
}
#endif // TARGET_XARCH
Expand Down
25 changes: 21 additions & 4 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -3942,7 +3942,7 @@ class Compiler

// false: we can add new tracked variables.
// true: We cannot add new 'tracked' variable
bool lvaTrackedFixed = false;
bool lvaTrackedFixed = false;

unsigned lvaCount; // total number of locals, which includes function arguments,
// special arguments, IL local variables, and JIT temporary variables
Expand Down Expand Up @@ -6849,15 +6849,15 @@ class Compiler
unsigned acdCount = 0;

// Get the index to use as part of the AddCodeDsc key for sharing throw blocks
unsigned bbThrowIndex(BasicBlock* blk, AcdKeyDesignator* dsg);
unsigned bbThrowIndex(BasicBlock* blk, AcdKeyDesignator* dsg);

struct AddCodeDscKey
{
public:
AddCodeDscKey(): acdKind(SCK_NONE), acdData(0) {}
AddCodeDscKey(SpecialCodeKind kind, BasicBlock* block, Compiler* comp);
AddCodeDscKey(AddCodeDsc* add);

static bool Equals(const AddCodeDscKey& x, const AddCodeDscKey& y)
{
return (x.acdData == y.acdData) && (x.acdKind == y.acdKind);
Expand Down Expand Up @@ -9992,13 +9992,30 @@ class Compiler
// JitStressEvexEncoding- Answer the question: Is Evex stress knob set
//
// Returns:
// `true` if user requests REX2 encoding.
// `true` if user requests EVEX encoding.
//
bool JitStressEvexEncoding() const
{
#ifdef DEBUG
return JitConfig.JitStressEvexEncoding() || JitConfig.JitStressRex2Encoding();
#endif // DEBUG
return false;
}

//------------------------------------------------------------------------
// DoJitStressPromotedEvexEncoding- Answer the question: Do we force promoted EVEX encoding.
//
// Returns:
// `true` if user requests promoted EVEX encoding.
//
bool DoJitStressPromotedEvexEncoding() const
{
#ifdef DEBUG
if (JitConfig.JitStressPromotedEvexEncoding() && compOpportunisticallyDependsOn(InstructionSet_APX))
{
return true;
}
#endif // DEBUG

return false;
}
Expand Down
Loading
Loading