-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Cast UInt64 to Single directly during const folding #106419
Conversation
@dotnet/jit-contrib PTAL. The new test is failing on mono:
So this might need to be fixed there, as well? I can make this test CoreCLR-only for now. SPMI is failing because it doesn't have any collections to run. |
This is the right thing for .NET 9; Mono has a tracking issue to make the behavior changes and standardize to the same implementation as RyuJIT: #100368 |
Got it. I'll let this CI run finish, and then push a change to disable it on Mono. |
Looks like the test is running on non-ARM64 legs, too -- probably because I forgot |
@dotnet/jit-contrib PTAL, the new test is passing. |
src/coreclr/jit/utils.cpp
Outdated
#ifdef TARGET_ARM64 | ||
// ARM64 supports casting directly to float | ||
return (float)u64; | ||
#else // !TARGET_ARM64 | ||
double d = convertUInt64ToDouble(u64); | ||
return (float)d; | ||
#endif // !TARGET_ARM64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this bug exists on all platforms, not just Arm64.
The code in the test is
ulong vr10 = 16105307123914158031UL;
float vr11 = 4294967295U | vr10;
This boils down to 4294967295U | 16105307123914158031UL
which is 16105307124325679103
The correct float
result is then 16105306574569865216.0f
, or rather the raw bits 1600094603
(the DEBUG output).
The result produced by the two step conversion, however, is 16105307674081492992.0f
, or rather the raw bits 1600094604
(the RELEASE output).
I would expect that all our current compilers (MSVC, Clang, and GCC) are producing correct results for return (float)u64
and we no longer need to do a two step conversion. If any are still producing incorrect results, we should log bugs against them and we'd need to hand roll an implementation that does the conversion instead (I can give reference to one if we need it, but I don't think we do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I tried both the one- and two-step conversion on x64, and in Debug and Release, I get 1600094604
. So it looks like the MSVC/Clang/GCC codegen on the const-folding path is matching what RyuJIT currently emits in Debug for x64. For this test case, is the definition of "correct" architecture-dependent? I'm guessing there's some subtle difference in rounding behavior between ARM64 and x64?
Regardless of the answer to that, I think you're right that we can update the cast logic for all platforms. I can do that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried both the one- and two-step conversion on x64, and in Debug and Release, I get 1600094604
Are you sure? I don't see the same behavior in godbolt: https://godbolt.org/z/s5MrTWaYM
-- note that m1
and m2
return the same result (raw bits 0x5f5f818b
) while m3
returns one higher (raw bits 0x5f5f818c
)
For this test case, is the definition of "correct" architecture-dependent?
No, we should be deterministic across all platforms for such a case; and correspondingly the added test should be enabled for all platforms as well, not just be Arm64 specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the ulong -> float
cast on x64 is represented in IR as ulong -> double -> float
, while on ARM64, we combine it into one cast during morph. I'm guessing we have to enable that path for all platforms, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the importer is creating ulong -> double -> float
for ulong -> float
conversions in IL? That sounds wrong given that those are not equivalent (as the example shows). If morph is making that transformation in the opposite direction it likewise sounds wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roslyn is likely relying on the JIT implementation for the constant folding here, not realizing it's incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree the helper has downsides too. But Roslyn could avoid using it when not targeting something with it available. For example, I imagine it could emit the equivalent of
ulong x = ...;
float f = x > long.MaxValue ? (float)(-x) + (float)0x8000000000000000 : (float)(long)x;
in those cases (or whatever the right way is to do the conversion manually). Signed long -> float conversion is representable with just conv.r4
, I believe.
I don't think we need to try to fix this in .NET 9, but we may want to unify the debug/release behavior on something. We should probably open an issue for more discussion about the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to try to fix this in .NET 9, but we may want to unify the debug/release behavior on something. We should probably open an issue for more discussion about the problem.
I've opened #106646 to track this. I think ARM64 was the only platform where Debug/Release behavior could diverge, due to const-folding always doing a two-step cast and ucvtf/scvtf
being able to encode ulong/long -> float
casts. Should we take this PR as-is for .NET 9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect this also repros on AVX512 capable hardware for x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I can repro it on my AVX512 machine, though this PR's changes seem to fix it. Here's the Debug codegen:
G_M27646_IG01: ;; offset=0x0000
push rbp
sub rsp, 48
lea rbp, [rsp+0x30]
xor eax, eax
mov qword ptr [rbp-0x08], rax
mov dword ptr [rbp-0x0C], eax
;; size=19 bbWeight=1 PerfScore 4.00
G_M27646_IG02: ;; offset=0x0013
cmp dword ptr [(reloc 0x7ffeb59917c0)], 0
je SHORT G_M27646_IG04
;; size=9 bbWeight=1 PerfScore 4.00
G_M27646_IG03: ;; offset=0x001C
call CORINFO_HELP_DBG_IS_JUST_MY_CODE
;; size=5 bbWeight=0.50 PerfScore 0.50
G_M27646_IG04: ;; offset=0x0021
nop
mov rax, 0xDF818B7FE778AFCF
mov qword ptr [rbp-0x08], rax
mov eax, -1
mov eax, eax
or rax, qword ptr [rbp-0x08]
vcvtusi2ss xmm0, rax
vmovss dword ptr [rbp-0x0C], xmm0
vmovss xmm0, dword ptr [rbp-0x0C]
call [System.BitConverter:SingleToUInt32Bits(float):uint]
mov dword ptr [rbp-0x10], eax
mov ecx, dword ptr [rbp-0x10]
call [System.Console:WriteLine(uint)]
nop
nop
;; size=62 bbWeight=1 PerfScore 22.50
G_M27646_IG05: ;; offset=0x005F
add rsp, 48
pop rbp
ret
;; size=6 bbWeight=1 PerfScore 1.75
And here's Release:
G_M27646_IG01: ;; offset=0x0000
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M27646_IG02: ;; offset=0x0004
mov ecx, 0x5F5F818B
call [System.Console:WriteLine(uint)]
nop
;; size=12 bbWeight=1 PerfScore 3.50
G_M27646_IG03: ;; offset=0x0010
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
Both now output 1600094603
. I can try tweaking the test to only run on ARM64, or on x64 if Avx512.IsSupported
is true.
bool runTest = (RuntimeInformation.ProcessArchitecture == Architecture.Arm64) || Avx512F.IsSupported; | ||
|
||
if (runTest) | ||
{ | ||
ulong vr10 = 16105307123914158031UL; | ||
float vr11 = 4294967295U | vr10; | ||
Assert.Equal(1600094603U, BitConverter.SingleToUInt32Bits(vr11)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the test rather assert that its producing 1600094604u
if (RuntimeInformation.ProcessArchitecture == Architecture.Arm64) || Avx512F.IsSupported
is false
?
That way we can detect any changes for other platforms or scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea; I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered that we don't build this test if the target arch isn't arm64/x64 (or if the runtime isn't CoreCLR). @tannergooding are you ok with only testing those platforms for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that’s fine, but ideally we’d ensure this runs everywhere long term
<CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64' AND '$(TargetArchitecture)' != 'x64'">true</CLRTestTargetUnsupported> | ||
<CLRTestTargetUnsupported Condition="'$(RuntimeFlavor)' != 'coreclr'">true</CLRTestTargetUnsupported> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only need is we skip this on Mono today, right?
Is that because they're always doing the ulong -> double -> float
2-step behavior?
Can we not instead use [SkipOnMono("https://github.com/dotnet/runtime/issues/#######", TestPlatforms.Any)]
and ensure that the test is compiled for all platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Mono seems to also do the two-step cast, so the test was failing across all Mono legs.
Can we not instead use [SkipOnMono("https://github.com/dotnet/runtime/issues/#######", TestPlatforms.Any)] and ensure that the test is compiled for all platforms?
Sure, I'll update it.
On x86, we're pretty explicit about keeping the cast two steps in morph: We represent the cast with a helper call to convert the |
Test is now passing on all CoreCLR Pri0 legs. |
/backport to release/9.0 |
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10477490137 |
Fixes #106338 (comment).