JIT: Faster Math.Max/Min for x64 #65625

EgorBo · 2022-02-19T22:27:55Z

Optimize Math.Max/Min to a single instruction on x64 when one of the arguments is a constant (not NaN and whoever will be implementing it has to be careful around -/+0.0)

float Test(float a) => Math.Max(a, 10);

Currently emits:

; Method Tests4:Test(float):float:this
G_M55200_IG01:
       vzeroupper 
G_M55200_IG02:
       vmovss   xmm0, dword ptr [reloc @RWD00]
       vucomiss xmm1, xmm0
       jp       SHORT G_M55200_IG03
       je       SHORT G_M55200_IG06
G_M55200_IG03:
       vucomiss xmm1, xmm1
       jp       SHORT G_M55200_IG05
       vucomiss xmm1, xmm0
       ja       SHORT G_M55200_IG04
       jmp      SHORT G_M55200_IG08
G_M55200_IG04:
       vmovaps  xmm0, xmm1
       jmp      SHORT G_M55200_IG08
G_M55200_IG05:
       vmovaps  xmm0, xmm1
       jmp      SHORT G_M55200_IG08
G_M55200_IG06:
       vmovaps  xmm1, xmm0
       vmovd    eax, xmm1
       test     eax, eax
       jl       SHORT G_M55200_IG07
       jmp      SHORT G_M55200_IG08
G_M55200_IG07:
       vmovss   xmm0, dword ptr [reloc @RWD00]
G_M55200_IG08:
       ret      
RWD00  	dd	41200000h		;        10
; Total bytes of code: 68

Expected codegen:

; Method Tests4:Test(float):float:this
       vzeroupper 
       vmaxss   xmm0, xmm1, dword ptr [reloc @RWD00]
       ret      
RWD00  	dd	41200000h ; 10.0
; Total bytes of code: 12

#65584 did it for ARM where we could do it even for both non-constants

The text was updated successfully, but these errors were encountered:

ghost · 2022-02-19T22:28:01Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Optimize Math.Max/Min to a single instruction on x64 when one of the arguments is a constant (not NaN and has to be careful around -/+0.0)

float Test(float a) => Math.Max(a, 10);

Currently emits:

; Method Tests4:Test(float):float:this
G_M55200_IG01:
       vzeroupper 
G_M55200_IG02:
       vmovss   xmm0, dword ptr [reloc @RWD00]
       vucomiss xmm1, xmm0
       jp       SHORT G_M55200_IG03
       je       SHORT G_M55200_IG06
G_M55200_IG03:
       vucomiss xmm1, xmm1
       jp       SHORT G_M55200_IG05
       vucomiss xmm1, xmm0
       ja       SHORT G_M55200_IG04
       jmp      SHORT G_M55200_IG08
G_M55200_IG04:
       vmovaps  xmm0, xmm1
       jmp      SHORT G_M55200_IG08
G_M55200_IG05:
       vmovaps  xmm0, xmm1
       jmp      SHORT G_M55200_IG08
G_M55200_IG06:
       vmovaps  xmm1, xmm0
       vmovd    eax, xmm1
       test     eax, eax
       jl       SHORT G_M55200_IG07
       jmp      SHORT G_M55200_IG08
G_M55200_IG07:
       vmovss   xmm0, dword ptr [reloc @RWD00]
G_M55200_IG08:
       ret      
RWD00  	dd	41200000h		;        10
; Total bytes of code: 68

Expected codegen:

; Method Tests4:Test(float):float:this
       vzeroupper 
       vmaxss   xmm0, xmm1, dword ptr [reloc @RWD00]
       ret      
RWD00  	dd	41200000h ; 10.0
; Total bytes of code: 12

#65584 did it for ARM where we could do it even for both non-constants

Author:	EgorBo
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`, `untriaged`
Milestone:	-

tannergooding · 2022-02-19T23:48:06Z

In particular, Math.Max and Math.Min for floating-point have a requirement, from the IEEE 754 spec, that -0.0 be the minimum of it and +0.0. Likewise, they have a requirement that NaN is propagated (if both inputs are NaN, then the exact NaN returned doesn't matter).

x86/x64 provide maxss and minss. These instructions will return the second operand if both operands are 0 (same or differing signs). Likewise, if either input is NaN, the second operand is returned. Otherwise, the correct max/min is returned.

What this functionally means is that if both inputs are unknown, we can't really "optimize" and instead have to use maxss/minss and then some follow up computations. However, if either input is constant then we actually can optimize it down to a single instruction in all cases:

In the case both are constant, we simply constant fold and it doesn't matter.
If either input is known to be NaN, it is used as the second argument (ensuring NaN is propagated since either input being NaN means the second argument is returned)
For maxss, if either input is +0.0 it takes the second argument (ensuring +0.0 is returned if both inputs are zero since both inputs being zero, of either sign, means the second argument is returned)
For minss, if either input is -0.0 it takes the second argument (ensuring -0.0 is returned if both inputs are zero since both inputs being zero, of either sign, means the second argument is returned)
Otherwise, for any other known constant input it is the first argument
- If the unknown is NaN, this ensures it is returned since either input being NaN means the second argument is returned
- If the unknown is +0.0 for maxss and the known was -0.0, it would ensure +0.0 is returned since both inputs being zero, of either sign, means the second argument is returned
- If the unknown is -0.0 for minss and the known was +0.0, it would ensure -0.0 is returned since both inputs being zero, of either sign, means the second argument is returned
- For any other inputs known or unknown; the default handling is already correct

This ensures that Max returns NaN and +0.0 and likewise that Min returns NaN and -0.0 for their respective special cases.

SkiFoD · 2022-02-21T10:11:12Z

Hey, everyone. I would like to take a look at the issue. @EgorBo Could you please tell me how you made "Math.Max" call to get inlined?

EgorBo · 2022-02-21T10:23:40Z

Hey, everyone. I would like to take a look at the issue. @EgorBo Could you please tell me how you made "Math.Max" call to get inlined?

Hey, sure! What do you mean inlined? Intrinsic?

SkiFoD · 2022-02-21T10:32:58Z

I'm trying to get the same emit, but I always get something like this:

So I can't look inside the Math.Max call to invistigate it.

EgorBo · 2022-02-21T10:34:36Z

@SkiFoD see #65584, it ifdefed special import for Max for x86/64, you need to enable it, most likely you will also have to modify lsraxarch and the xarch emitter

SkiFoD · 2022-02-21T10:40:19Z

@EgorBo Thank you, so basically you are changing the call to instrinsic use.

SkiFoD · 2022-02-21T12:10:39Z

@EgorBo Seems like I figured out how to make the changes, but I got a few questions.
Is it all right that the call transforms to the intrinsic use only under a pressure (like if the Math.Max/Min is called in a huge cycle) or if there are envs "COMPlus_TieredCompilation=0" and "COMPlus_JITMinOpts=0" set up?
p.s. For some reason if I used the envs along with "COMPlus_JitDump/JitDisasm" then there are not output in the console :( Have you ever come across with such a behaviour?

EgorBo · 2022-02-21T16:53:47Z

@EgorBo Seems like I figured out how to make the changes, but I got a few questions. Is it all right that the call transforms to the intrinsic use only under a pressure (like if the Math.Max/Min is called in a huge cycle) or if there are envs "COMPlus_TieredCompilation=0" and "COMPlus_JITMinOpts=0" set up? p.s. For some reason if I used the envs along with "COMPlus_JitDump/JitDisasm" then there are not output in the console :( Have you ever come across with such a behaviour?

Yes, we don't expand intrinsics in tier0 (unoptimized code) - only a few "must expand" ones. So I'd suggest to disable tiered compilation during development

SkiFoD · 2022-03-15T08:29:59Z

@EgorBo please take a look at my PR, I would like to know what you think of this.

danmoseley · 2022-04-10T03:18:52Z

linking #65700

…#69434) * Add xarch optimization for min/max (#65625) * Changes according to the requirements (#65625) * Draft for Math.Max/Math.Min optimization (#65625) * Draft for optimizing Math.Max/Math.Min with a const (#65625) * Fix tests (#65625) * Refactoring of the conditions (#65625) * Fix of the summary (#65625) * Refactoring due to the PR comments (#65625) * Add spilling side effect + Fix of formats (#65625) * Update src/coreclr/jit/importer.cpp Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> * Update src/coreclr/jit/importer.cpp Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

EgorBo added the tenet-performance Performance related issue label Feb 19, 2022

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 19, 2022

EgorBo added good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors labels Feb 19, 2022

EgorBo added this to the Future milestone Feb 19, 2022

EgorBo removed the untriaged New issue has not been triaged by the area owner label Feb 19, 2022

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Feb 22, 2022

Draft for Min/Max intrinsics xarch (dotnet#65625)

954fa30

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Mar 1, 2022

Fix for VEX and Non-VEX instructions (dotnet#65625)

b450b25

danmoseley removed good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors labels Apr 10, 2022

SkiFoD added a commit to SkiFoD/runtime that referenced this issue May 19, 2022

Add xarch optimization for min/max (dotnet#65625)

363ef7e

SkiFoD added a commit to SkiFoD/runtime that referenced this issue May 19, 2022

Changes according to the requirements (dotnet#65625)

7331188

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 2, 2022

Draft for Math.Max/Math.Min optimization (dotnet#65625)

7f58b25

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 3, 2022

Draft for optimizing Math.Max/Math.Min with a const (dotnet#65625)

0f74bdc

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 3, 2022

Fix tests (dotnet#65625)

d1b4594

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 3, 2022

Refactoring of the conditions (dotnet#65625)

9347238

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 20, 2022

Fix of the summary (dotnet#65625)

75a7135

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 20, 2022

Refactoring due to the PR comments (dotnet#65625)

fd79612

SkiFoD added a commit to SkiFoD/runtime that referenced this issue Jun 22, 2022

Add spilling side effect + Fix of formats (dotnet#65625)

1aed08f

SkiFoD mentioned this issue Jun 22, 2022

Optimization. Use Min/Max intrinsics if one of arguments is constant. #69434

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 22, 2022

tannergooding closed this as completed in #69434 Jun 23, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 23, 2022

ghost locked as resolved and limited conversation to collaborators Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Faster Math.Max/Min for x64 #65625

JIT: Faster Math.Max/Min for x64 #65625

EgorBo commented Feb 19, 2022 •

edited

Loading

ghost commented Feb 19, 2022

tannergooding commented Feb 19, 2022 •

edited

Loading

SkiFoD commented Feb 21, 2022

EgorBo commented Feb 21, 2022

SkiFoD commented Feb 21, 2022

EgorBo commented Feb 21, 2022

SkiFoD commented Feb 21, 2022

SkiFoD commented Feb 21, 2022

EgorBo commented Feb 21, 2022

SkiFoD commented Mar 15, 2022

danmoseley commented Apr 10, 2022

JIT: Faster Math.Max/Min for x64 #65625

JIT: Faster Math.Max/Min for x64 #65625

Comments

EgorBo commented Feb 19, 2022 • edited Loading

ghost commented Feb 19, 2022

tannergooding commented Feb 19, 2022 • edited Loading

SkiFoD commented Feb 21, 2022

EgorBo commented Feb 21, 2022

SkiFoD commented Feb 21, 2022

EgorBo commented Feb 21, 2022

SkiFoD commented Feb 21, 2022

SkiFoD commented Feb 21, 2022

EgorBo commented Feb 21, 2022

SkiFoD commented Mar 15, 2022

danmoseley commented Apr 10, 2022

EgorBo commented Feb 19, 2022 •

edited

Loading

tannergooding commented Feb 19, 2022 •

edited

Loading