Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar/Packed conversions for floating point to integer #97529

Merged
merged 82 commits into from
Apr 5, 2024
Merged
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
447a45e
merging with main
khushal1996 Oct 24, 2023
42332de
Basic working version of double -> ulong saturation
khushal1996 Oct 25, 2023
77876cd
Moving the code in a do-while with proper checks to amke sure we are …
khushal1996 Oct 27, 2023
e04bd5d
adjusting comments
khushal1996 Nov 3, 2023
fabed24
Merging with main
khushal1996 Nov 29, 2023
274018e
removing conflicts from gentree.h flags
khushal1996 Dec 6, 2023
2fc6d75
float to uint conversion verified. removing commented code
khushal1996 Dec 6, 2023
7a38ba9
merging with main. Making changes to simdashwintrinsic.cpp and
khushal1996 Dec 7, 2023
4464448
progress on double to long morphing
khushal1996 Dec 13, 2023
6854e53
another attempt at double to long conversion
khushal1996 Dec 14, 2023
915bb3d
Merge with main
khushal1996 Dec 20, 2023
e63313f
adding handling for scalar conversion cases for SSE2. Remaining float…
khushal1996 Jan 3, 2024
cfb66ed
partial changes for float to int conversion using double to int for a…
khushal1996 Jan 5, 2024
2417cd2
adding float to int working scalar conversion case. Working on vectro…
khushal1996 Jan 8, 2024
4bb2f01
partial work on float to int packed conversion
khushal1996 Jan 11, 2024
0ad99c9
partial version of float to int conversion
khushal1996 Jan 17, 2024
8f9e225
working version of float to int scalar/packed for avx512
khushal1996 Jan 24, 2024
a92c010
complete conversions code for floating point to integral conversions …
khushal1996 Jan 24, 2024
1ce320a
Merging with main.
khushal1996 Jan 25, 2024
37460f7
fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR …
khushal1996 Jan 30, 2024
665e79b
adding JIT_Dbl2Int for target_x86 and other architectures.
khushal1996 Jan 31, 2024
921b833
Supporting x86 for saturating conversions as well
khushal1996 Feb 5, 2024
e3eaa88
fixing errors in packed conversion
khushal1996 Feb 6, 2024
bdd0fc3
accomodate unsigned in IR
khushal1996 Feb 8, 2024
9510547
adding evex support for cvttss2si
khushal1996 Feb 13, 2024
46e2c88
Mergw with main
khushal1996 Feb 13, 2024
3204bbc
Catch divide by zero exception
khushal1996 Feb 14, 2024
813c72e
Handle overflow cases
khushal1996 Feb 15, 2024
7795754
Fix tests to check saturating behavior
khushal1996 Feb 15, 2024
8392b31
Correct mapping of instructions
khushal1996 Feb 15, 2024
ea6acb6
Convert float -> ulong / long as float -> double -> ulong / long
khushal1996 Feb 22, 2024
c89c4b9
Merging with main
khushal1996 Oct 24, 2023
1c23e73
Merging with main
khushal1996 Nov 3, 2023
0aaac78
removing conflicts from gentree.h flags
khushal1996 Dec 6, 2023
c564b37
merging with main. Making changes to simdashwintrinsic.cpp and
khushal1996 Dec 7, 2023
172c967
adding a new helper function ofr float to uint scalar conversion for …
khushal1996 Dec 20, 2023
31b899a
Merging with main
khushal1996 Jan 3, 2024
7730d46
partial changes for float to int conversion using double to int for a…
khushal1996 Jan 5, 2024
facb8b4
partial version of float to int conversion
khushal1996 Jan 17, 2024
970db62
working version of float to int scalar/packed for avx512
khushal1996 Jan 24, 2024
cfc52bf
Merging with main.
khushal1996 Jan 25, 2024
9c4edd5
Changing the way helper functions are handled in morph
khushal1996 Jan 30, 2024
e9ac9a0
adding JIT_Dbl2Int for target_x86 and other architectures.
khushal1996 Jan 31, 2024
6c7be45
Supporting x86 for saturating conversions as well
khushal1996 Feb 5, 2024
8ebe57d
fixing errors in packed conversion
khushal1996 Feb 6, 2024
5f8bbbc
Correct mapping of instructions
khushal1996 Feb 15, 2024
6553069
delete extra files
khushal1996 Feb 22, 2024
597f6f3
Merging main
khushal1996 Feb 24, 2024
4de1da4
Merge with main and adding new helpers in nativeaot
khushal1996 Feb 24, 2024
9670355
changing type of cast node as signed when making cast nodes
khushal1996 Feb 26, 2024
f2c6487
Avoiding removing extra element from the stack
khushal1996 Feb 27, 2024
6197b20
Fix formatting, Change comp->IsaSupportedDebugOnly to IsBaselineVecto…
khushal1996 Feb 27, 2024
feb4be0
Reverting some changes to maintain uniformity in code
khushal1996 Feb 27, 2024
aa9e127
Handling cases where AVX512 is not supported in simdashwintrinsic.cpp
khushal1996 Feb 28, 2024
34341cd
fixing exit conditions for ConvertVectorT_ToDouble
khushal1996 Feb 28, 2024
5ff9d1a
Check for AVX512 support for TARGET_XARCH
khushal1996 Feb 28, 2024
d93dc5b
Avoid avx512 path for x86
khushal1996 Feb 29, 2024
2a1b6f8
Enable AVX512F codepath for conversions in x86 arch. Move x86 to usin…
khushal1996 Mar 12, 2024
48e0acf
Add SSE41 path for scalar conversions and 128 bit float to int packed…
khushal1996 Mar 13, 2024
8506ece
Adding SSE41 path for floating point to UINT scalar conversions
khushal1996 Mar 14, 2024
408c716
Add AVX path for ConvertToInt32
khushal1996 Mar 14, 2024
b1f4f67
Adding comments and cleaning the code
khushal1996 Mar 18, 2024
ab7dfb7
Fix errors in double to ulong
khushal1996 Mar 19, 2024
f3e4bf5
Addressing review comments
khushal1996 Mar 21, 2024
b620c2f
Fix tests
khushal1996 Mar 22, 2024
487c9e2
Reverse val < 0 check in dbltoUint and dbltoUlng helpers
khushal1996 Mar 22, 2024
f145e1a
Add overflow conversions for 86/x64, remove FastDbl2Lng and inline it
khushal1996 Mar 22, 2024
4cb90fb
Apply suggestions from code review
khushal1996 Mar 23, 2024
98c23de
Correct Dbl2UlngOvf
khushal1996 Mar 23, 2024
782c8d4
Apply suggestions from code review
jkotas Mar 23, 2024
ab7b4de
Apply suggestions from code review
jkotas Mar 23, 2024
b4b8411
Update src/coreclr/vm/jithelpers.cpp
jkotas Mar 23, 2024
e474ed1
Disable failing mono tests
khushal1996 Mar 29, 2024
9e6ddd0
Merge branch 'main' into kcm-scalar-convert-rebased
khushal1996 Mar 29, 2024
d27c4f7
Merge branch 'main' into kcm-scalar-convert-rebased
khushal1996 Apr 1, 2024
70f2170
Working version of saturating logic moved to lowering for x86/x64
khushal1996 Apr 2, 2024
0f1dc05
Making changes for pre SSE41
khushal1996 Apr 2, 2024
36e0655
Apply suggestions from code review
khushal1996 Apr 2, 2024
c4f28c7
Merge branch 'main' into kcm-scalar-convert-rebased
khushal1996 Apr 3, 2024
9aac0f4
Removing dead code
khushal1996 Apr 4, 2024
080ec88
Fix formatting
khushal1996 Apr 4, 2024
523f1cc
Address review comments, add proper docstrings
khushal1996 Apr 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Changing the way helper functions are handled in morph
fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level
khushal1996 committed Mar 12, 2024
commit 9c4edd5f9dc3985ea8434ce5a5119e206a2b4fdd
18 changes: 18 additions & 0 deletions difs/doubleToLong-avx512-base.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
; Assembly listing for method Program:DoubleToLong(double):long (FullOpts)
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0003
C4E1FB2CC0 vcvttsd2si rax, xmm0

G_M000_IG03: ;; offset=0x0008
C3 ret

; Total bytes of code 9
18 changes: 18 additions & 0 deletions difs/doubleToLong-avx512-base.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
; Assembly listing for method Program:DoubleToLong(double):long (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0003
C4E1FB2CC0 vcvttsd2si rax, xmm0

G_M000_IG03: ;; offset=0x0008
C3 ret

; Total bytes of code 9
28 changes: 28 additions & 0 deletions difs/doubleToLong-avx512-diff.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
; Assembly listing for method Program:DoubleToLong(double):long (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0003
62F3FD0855053200000000 vfixupimmsd xmm0, xmm0, xmmword ptr [reloc @RWD00], 0
C5F9C20D390000000D vcmppd xmm1, xmm0, xmmword ptr [reloc @RWD16], 13
C5F8101541000000 vmovups xmm2, xmmword ptr [reloc @RWD32]
C4E1FB2CC0 vcvttsd2si rax, xmm0
62F2FD087CC0 vpbroadcastq xmm0, rax
62F3ED0825C8CA vpternlogq xmm1, xmm2, xmm0, -54
C4E1F97EC8 vmovd rax, xmm1

G_M000_IG03: ;; offset=0x0036
C3 ret

RWD00 dq 0000000000000088h, 0000000000000000h
RWD16 dq 43E0000000000000h, 43E0000000000000h
RWD32 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh

; Total bytes of code 55
17 changes: 17 additions & 0 deletions difs/doubleToLong-non-avx512-base.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
; Assembly listing for method Program:DoubleToLong(double):long (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000

G_M000_IG02: ;; offset=0x0000
F2480F2CC0 cvttsd2si rax, xmm0

G_M000_IG03: ;; offset=0x0005
C3 ret

; Total bytes of code 6
20 changes: 20 additions & 0 deletions difs/doubleToLong-non-avx512-diff.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
; Assembly listing for method Program:DoubleToLong(double):long (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
4883EC28 sub rsp, 40

G_M000_IG02: ;; offset=0x0004
E8E771C95F call CORINFO_HELP_DBL2LNG
90 nop

G_M000_IG03: ;; offset=0x000A
4883C428 add rsp, 40
C3 ret

; Total bytes of code 15
35 changes: 35 additions & 0 deletions difs/doubleToLong128-avx512-base.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
; Assembly listing for method Program:DoubleToLong128(System.Runtime.Intrinsics.Vector128`1[double]):System.Runtime.Intrinsics.Vector128`1[long] (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 8 single block inlinees; 4 inlinees without PGO data

G_M000_IG01: ;; offset=0x0000
4883EC38 sub rsp, 56
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0007
488B02 mov rax, qword ptr [rdx]
4889442428 mov qword ptr [rsp+0x28], rax
C4E1FB2C442428 vcvttsd2si rax, qword ptr [rsp+0x28]
4889442430 mov qword ptr [rsp+0x30], rax
488B442430 mov rax, qword ptr [rsp+0x30]
488B5208 mov rdx, qword ptr [rdx+0x08]
4889542418 mov qword ptr [rsp+0x18], rdx
C4E1FB2C542418 vcvttsd2si rdx, qword ptr [rsp+0x18]
4889542420 mov qword ptr [rsp+0x20], rdx
488B542420 mov rdx, qword ptr [rsp+0x20]
48890424 mov qword ptr [rsp], rax
4889542408 mov qword ptr [rsp+0x08], rdx
C5F8280424 vmovaps xmm0, xmmword ptr [rsp]
C5F81101 vmovups xmmword ptr [rcx], xmm0
488BC1 mov rax, rcx

G_M000_IG03: ;; offset=0x004F
4883C438 add rsp, 56
C3 ret

; Total bytes of code 84
29 changes: 29 additions & 0 deletions difs/doubleToLong128-avx512-diff.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
; Assembly listing for method Program:DoubleToLong128(System.Runtime.Intrinsics.Vector128`1[double]):System.Runtime.Intrinsics.Vector128`1[long] (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0003
C5F81002 vmovups xmm0, xmmword ptr [rdx]
62F3FD0854052E00000000 vfixupimmpd xmm0, xmm0, xmmword ptr [reloc @RWD00], 0
C5F9C20D350000000D vcmppd xmm1, xmm0, xmmword ptr [reloc @RWD16], 13
C5F810153D000000 vmovups xmm2, xmmword ptr [reloc @RWD32]
62F1FD087AC0 vcvttpd2qq xmm0, xmm0
62F3ED0825C8CA vpternlogq xmm1, xmm2, xmm0, -54
C5F81109 vmovups xmmword ptr [rcx], xmm1
488BC1 mov rax, rcx

G_M000_IG03: ;; offset=0x0037
C3 ret

RWD00 dq 0000000000000088h, 0000000000000088h
RWD16 dq 43E0000000000000h, 43E0000000000000h
RWD32 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh

; Total bytes of code 56
21 changes: 21 additions & 0 deletions difs/floatToUint-avx512-diff.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
; Assembly listing for method Program:FloatToUint(float):uint (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0003
62F37D0855051200000000 vfixupimmss xmm0, xmm0, xmmword ptr [reloc @RWD00], 0
62F17E0878C0 vcvttss2usi eax, xmm0

G_M000_IG03: ;; offset=0x0014
C3 ret

RWD00 dq 0000000008000088h, 0000000000000000h

; Total bytes of code 21
16 changes: 16 additions & 0 deletions difs/floatToUint-avx512_base.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
; Assembly listing for method Program:FloatToUint(float):uint (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
C5F877 vzeroupper

G_M000_IG02: ;; offset=0x0003
C4E1FA2CC0 vcvttss2si rax, xmm0

G_M000_IG03: ;; offset=0x0008
C3 ret
17 changes: 17 additions & 0 deletions difs/floatToUint-non-avx512-base.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
; Assembly listing for method Program:FloatToUint(float):uint (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000

G_M000_IG02: ;; offset=0x0000
F3480F2CC0 cvttss2si rax, xmm0

G_M000_IG03: ;; offset=0x0005
C3 ret

; Total bytes of code 6
20 changes: 20 additions & 0 deletions difs/floatToUint-non-avx512-diff.asm
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
; Assembly listing for method Program:FloatToUint(float):uint (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
4883EC28 sub rsp, 40

G_M000_IG02: ;; offset=0x0004
E867CCB65F call CORINFO_HELP_FLT2UINT
90 nop

G_M000_IG03: ;; offset=0x000A
4883C428 add rsp, 40
C3 ret

; Total bytes of code 15
20 changes: 20 additions & 0 deletions difs/floatToUint-non-avx512-diff.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
; Assembly listing for method Program:FloatToUint(float):uint (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01: ;; offset=0x0000
4883EC28 sub rsp, 40

G_M000_IG02: ;; offset=0x0004
E867CCB65F call CORINFO_HELP_FLT2UINT
90 nop

G_M000_IG03: ;; offset=0x000A
4883C428 add rsp, 40
C3 ret

; Total bytes of code 15
20 changes: 1 addition & 19 deletions src/coreclr/jit/morph.cpp
Original file line number Diff line number Diff line change
@@ -508,27 +508,12 @@ GenTree* Compiler::fgMorphExpandCast(GenTreeCast* tree)
switch (dstType)
{
case TYP_INT:
<<<<<<< HEAD
#ifdef TARGET_XARCH
if (!tree->IsSaturatedConversion())
{
return fgMorphCastIntoHelper(tree, CORINFO_HELP_DBL2INT, oper);
}
#endif //TARGET_XARCH
=======
#ifdef TARGET_AMD64
<<<<<<< HEAD
return fgMorphCastIntoHelper(tree, CORINFO_HELP_DBL2INT, oper);
#else //TARGET_AMD64
>>>>>>> 3b121bdc382 (adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512.)
return nullptr;
=======
if (!tree->IsSaturatedConversion())
{
return fgMorphCastIntoHelper(tree, CORINFO_HELP_DBL2INT, oper);
}
>>>>>>> 59d881e8d6a (partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working)
#endif //TARGET_AMD64
return nullptr;

case TYP_UINT:
@@ -543,17 +528,14 @@ GenTree* Compiler::fgMorphExpandCast(GenTreeCast* tree)
return fgMorphCastIntoHelper(tree, CORINFO_HELP_DBL2UINT, oper);

case TYP_LONG:
<<<<<<< HEAD
#ifdef TARGET_XARCH
if (!tree->IsSaturatedConversion())
{
return fgMorphCastIntoHelper(tree, CORINFO_HELP_DBL2LNG, oper);
}
#endif //TARGET_XARCH
return nullptr;
=======
#endif //TARGET_XARCH
return fgMorphCastIntoHelper(tree, CORINFO_HELP_DBL2LNG, oper);
>>>>>>> 3b121bdc382 (adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512.)

case TYP_ULONG:
#ifdef TARGET_AMD64
6 changes: 3 additions & 3 deletions src/coreclr/jit/simdashwintrinsic.cpp
Original file line number Diff line number Diff line change
@@ -1248,15 +1248,15 @@ GenTree* Compiler::impSimdAsHWIntrinsicSpecial(NamedIntrinsic intrinsic,

//run vfixupimmsd base on table and no flags reporting
GenTree* saturate_val = gtNewSimdHWIntrinsicNode(simdType, op1, op2Clone, tbl, gtNewIconNode(0),
NI_AVX512F_Fixup, fieldType, simdSize);
NI_AVX512F_Fixup, simdBaseJitType, simdSize);

GenTree* max_val = gtNewSimdCreateBroadcastNode(simdType, gtNewDconNodeF(static_cast<float>(INT64_MAX)), fieldType, simdSize);
GenTree* max_val = gtNewSimdCreateBroadcastNode(simdType, gtNewDconNodeF(static_cast<float>(INT64_MAX)), simdBaseJitType, simdSize);
GenTree* max_valDup = gtNewSimdCreateBroadcastNode(simdType, gtNewIconNode(INT64_MAX, TYP_LONG), CORINFO_TYPE_LONG, simdSize);
//we will be using the input value twice
GenTree* saturate_valDup = fgMakeMultiUse(&saturate_val);

//usage 1 --> compare with max value of integer
saturate_val = gtNewSimdCmpOpNode(GT_GE, simdType, saturate_val, max_val, fieldType, simdSize);
saturate_val = gtNewSimdCmpOpNode(GT_GE, simdType, saturate_val, max_val, simdBaseJitType, simdSize);
//cast it

NamedIntrinsic intrinsic = (simdSize == 16) ? NI_AVX512DQ_VL_ConvertToVector128Int64WithTruncation