Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple GT_NEG optimization for #13837 #43921

Merged
merged 5 commits into from
Nov 6, 2020
Merged

Conversation

JulieLeeMSFT
Copy link
Member

@JulieLeeMSFT JulieLeeMSFT commented Oct 27, 2020

Fixes #13837

replaces #43148 (use rebase to fix merge related issues).

Total bytes of diff: -42 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
         -13 : System.Private.CoreLib.dasm (-0.00% of base)
          -8 : Microsoft.VisualBasic.Core.dasm (-0.00% of base)
          -8 : System.Private.Xml.dasm (-0.00% of base)
          -8 : System.Runtime.Numerics.dasm (-0.01% of base)
          -4 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
          -1 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
6 total files with Code Size differences (6 improved, 0 regressed), 262 unchanged.
Top method improvements (bytes):
          -8 (-1.44% of base) : Microsoft.VisualBasic.Core.dasm - Financial:NPer(double,double,double,double,int):double
          -8 (-5.41% of base) : System.Runtime.Numerics.dasm - Complex:op_Division(Complex,Complex):Complex
          -6 (-1.46% of base) : System.Private.Xml.dasm - XmlSqlBinaryReader:FillAllowEOF():bool:this
          -5 (-0.70% of base) : System.Private.CoreLib.dasm - Number:NumberToFloatingPointBitsSlow(byref,byref,int,int,int):long
          -4 (-1.02% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceConstructorSymbol:CalculateLocalSyntaxOffset(int,SyntaxTree):int:this
          -4 (-1.94% of base) : System.Private.CoreLib.dasm - ScopeTree:AddScopeInfo(byte,int):this
          -2 (-1.55% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunCounted(byref,int,Span`1,byref,byref):bool
          -2 (-1.05% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunShortest(byref,byref,byref,Span`1,byref,byref):bool
          -2 (-0.19% of base) : System.Private.Xml.dasm - FloatingDecimal:AdjustDbl(double):double:this
          -1 (-0.20% of base) : Microsoft.CodeAnalysis.dasm - CustomDebugInfoWriter:SerializeReferenceToIteratorClass(String,ArrayBuilder`1)
Top method improvements (percentages):
          -8 (-5.41% of base) : System.Runtime.Numerics.dasm - Complex:op_Division(Complex,Complex):Complex
          -4 (-1.94% of base) : System.Private.CoreLib.dasm - ScopeTree:AddScopeInfo(byte,int):this
          -2 (-1.55% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunCounted(byref,int,Span`1,byref,byref):bool
          -6 (-1.46% of base) : System.Private.Xml.dasm - XmlSqlBinaryReader:FillAllowEOF():bool:this
          -8 (-1.44% of base) : Microsoft.VisualBasic.Core.dasm - Financial:NPer(double,double,double,double,int):double
          -2 (-1.05% of base) : System.Private.CoreLib.dasm - Grisu3:TryRunShortest(byref,byref,byref,Span`1,byref,byref):bool
          -4 (-1.02% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceConstructorSymbol:CalculateLocalSyntaxOffset(int,SyntaxTree):int:this
          -5 (-0.70% of base) : System.Private.CoreLib.dasm - Number:NumberToFloatingPointBitsSlow(byref,byref,int,int,int):long
          -1 (-0.20% of base) : Microsoft.CodeAnalysis.dasm - CustomDebugInfoWriter:SerializeReferenceToIteratorClass(String,ArrayBuilder`1)
          -2 (-0.19% of base) : System.Private.Xml.dasm - FloatingDecimal:AdjustDbl(double):double:this
10 total methods with Code Size differences (10 improved, 0 regressed), 258863 unchanged.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 27, 2020
@JulieLeeMSFT JulieLeeMSFT changed the title 13837 2 Simple GT_NEG optimization for #13837 Oct 27, 2020
@danmoseley
Copy link
Member

@JulieLeeMSFT I added triple-back-ticks (```) above and below your text dump above so it's easier to read.

@JulieLeeMSFT
Copy link
Member Author

JulieLeeMSFT commented Oct 27, 2020

@CarolEidt please see the below diff. The net effects of before and after are the same. However, we could save the 2nd mov if we moved dword ptr [rsp+E4H] to edx initially. PTAL.

Before:

       mov      edx, ebp
       neg      edx
       add      edx, dword ptr [rsp+E4H]

After:

       mov      eax, dword ptr [rsp+E4H]
       mov      edx, eax
       sub      edx, ebp

PMI dump:

Scope info: begin block BB19, IL range [0B4..0BE)
Scope info: open scopes =
   1 (V01 arg1) [000..202)
   2 (V02 arg2) [000..202)
   13 (V13 loc10) [000..202)
   8 (V08 loc5) [000..202)
   3 (V03 loc0) [000..202)
   9 (V09 loc6) [000..202)
   10 (V10 loc7) [000..202)
Added IP mapping: 0x00B4 STACK_EMPTY (G_M6661_IG16,ins#0,ofs#0) label
Generating: N421 (???,???) [000932] ------------                 IL_OFFSET void   IL offset: 0xb4 REG NA
Generating: N423 (  1,  1) [000101] -----------z       t101 =    LCL_VAR   int    V10 loc7         u:2 rax (last use) REG rax <l:$350, c:$34f>
Generating: N425 (  1,  1) [000104] ------------       t104 =    LCL_VAR   int    V53 tmp23        u:2 rbp REG rbp <l:$240, c:$280>
                                                              /--*  t101   int    
                                                              +--*  t104   int    
Generating: N427 (  3,  3) [000106] ----G-------       t106 = *  SUB       int    REG rdx <l:$359, c:$358>
IN0058:        mov      eax, dword ptr [V10 rsp+E4H]
							V10 in reg rax is becoming live  [000101]
							Live regs: 0000F0E8 {rbx rbp rsi rdi r12 r13 r14 r15} => 0000F0E9 {rax rbx rbp rsi rdi r12 r13 r14 r15}
							V10 in reg rax is becoming dead  [000101]
							Live regs: 0000F0E9 {rax rbx rbp rsi rdi r12 r13 r14 r15} => 0000F0E8 {rbx rbp rsi rdi r12 r13 r14 r15}
							Live vars: {V01 V02 V03 V08 V09 V10 V13 V52 V53} => {V01 V02 V03 V08 V09 V13 V52 V53}
IN0059:        mov      edx, eax
IN005a:        sub      edx, ebp
                                                              /--*  t106   int    
Generating: N429 (  7,  6) [000108] DA--G-------              *  STORE_LCL_VAR int    V32 tmp2         d:3 rdx REG rdx
							V32 in reg rdx is becoming live  [000108]
							Live regs: 0000F0E8 {rbx rbp rsi rdi r12 r13 r14 r15} => 0000F0EC {rdx rbx rbp rsi rdi r12 r13 r14 r15}
							Live vars: {V01 V02 V03 V08 V09 V13 V52 V53} => {V01 V02 V03 V08 V09 V13 V32 V52 V53}

Scope info: end block BB19, IL range [0B4..0BE)

@CarolEidt
Copy link
Contributor

The net effects of before and after are the same. However, we could save the 2nd mov if we moved dword ptr [rsp+E4H] to edx initially.

This appears to be an issue with preferencing, though I can't tell for certain without analyzing the register allocation dump.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should fix DevDiv_545500

src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
@JulieLeeMSFT JulieLeeMSFT self-assigned this Nov 3, 2020
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Nov 3, 2020
@JulieLeeMSFT
Copy link
Member Author

JulieLeeMSFT commented Nov 4, 2020

This appears to be an issue with preferencing, though I can't tell for certain without analyzing the register allocation dump.

diff-13837.txt

@CarolEidt Attached the log file.

@JulieLeeMSFT
Copy link
Member Author

@dotnet/jit-contrib PTAL. All tests have passed.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved
But, I would suggest that we remove all the unnecessary unchanged comment lines

src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
@CarolEidt
Copy link
Contributor

@JulieLeeMSFT - sorry for the delay in analyzing the dump that you uploaded. Here's my analysis

If you look at the dump where we create the RefPositions (I search for "DefList" and then the instruction number we're interested in, [000106] is the first one in this example):

DefList: {  }
N427 (  3,  3) [000106] ----G-------              *  SUB       int    REG NA <l:$359, c:$358>
<RefPosition #309 @427 RefTypeUse <Ivl:7 V10> LCL_VAR BB19 regmask=[allInt] minReg=1 last>
<RefPosition #310 @427 RefTypeUse <Ivl:36 V53> LCL_VAR BB19 regmask=[allInt] minReg=1 last>
Interval 112: int RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #311 @428 RefTypeDef <Ivl:112> SUB BB19 regmask=[allInt] minReg=1>
Interval <V10/L7> already has a related interval

This says that V10 already has a related interval, so we can't set Interval 112 as its relatedInterval.
This means that when we pick a register for I112 we won't try to give it the same register as V10.

DefList: { N427.t106. SUB }
N429 (  7,  6) [000108] DA--G-------              *  STORE_LCL_VAR int    V32 tmp2         d:3 NA REG NA
<RefPosition #312 @429 RefTypeUse <Ivl:112>  BB19 regmask=[allInt] minReg=1 last>
Assigning related <V32/L22> to <I112>

Here we make V32 the relatedInterval of I112 (the value produced by the SUB). That means it will
try to give I112 and V32 the same register, as a target preference. However, we've lost the more
critical RMW-related preference.

There's a general issue, #11959, for tuning of LSRA preferencing. It would be good to consider this example in that context. What test or library method was this?

@JulieLeeMSFT
Copy link
Member Author

There's a general issue, #11959, for tuning of LSRA preferencing. It would be good to consider this example in that context. What test or library method was this?

Thanks for the analysis. It was from jit-diff --frameworks test.

jit-diff diff --output c:\diffs --frameworks --pmi ...

And, it showed in

.\microsoft.codeanalysis.dasm
RealParser:ConvertDecimalToFloatingPointBits(DecimalFloatingPointString,FloatingPointType,byref):int

@JulieLeeMSFT JulieLeeMSFT merged commit cecbbfb into dotnet:master Nov 6, 2020
@JulieLeeMSFT JulieLeeMSFT deleted the 13837-2 branch November 6, 2020 15:21
tqiu8 pushed a commit to tqiu8/runtime that referenced this pull request Nov 9, 2020
author Stephen Toub <stoub@microsoft.com> 1604601164 -0500
committer Tammy Qiu <tammy.qiu@yahoo.com> 1604960878 -0500

Add stream conformance tests for TranscodingStream (dotnet#44248)

* Add stream conformance tests for TranscodingStream

* Special-case 0-length input buffers to TranscodingStream.Write{Async}

The base implementation of Encoder.Convert doesn't like empty inputs.  Regardless, if the input is empty, we can avoid a whole bunch of unnecessary work.

JIT: minor inliner refactoring (dotnet#44215)

Extract out the budget check logic so it can vary by inlining policy.
Use this to exempt the FullPolicy from budget checking.

Fix inline xml to dump the proper (full name) hash for inlinees.

Update range dumper to dump ranges in hex.

Remove unused QCall for WinRTSupported (dotnet#44278)

ConcurrentQueueSegment allows spinning threads to sleep. (dotnet#44265)

* Allow threads to sleep when ConcurrentQueue has many enqueuers/dequeuers.

* Update src/libraries/System.Private.CoreLib/src/System/Collections/Concurrent/ConcurrentQueueSegment.cs

Co-authored-by: Stephen Toub <stoub@microsoft.com>

* Apply suggestions from code review

Co-authored-by: Stephen Toub <stoub@microsoft.com>

Co-authored-by: AMD DAYTONA EPYC <amd@amd-DAYTONA-X0.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>

File.Exists() is not null when true (dotnet#44310)

* File.Exists() is not null when true

* Fix compile

* Fix compile 2

[master][watchOS] Add simwatch64 support (dotnet#44303)

Xcode 12.2 removed 32 bits support for watchOS simulators, this PR helps to fix xamarin/xamarin-macios#9949, we have tested the new binaries and they are working as expected

![unknown](https://user-images.githubusercontent.com/204671/98253709-64413200-1f49-11eb-9774-8c5aa416fc57.png)

Co-authored-by: dalexsoto <dalexsoto@users.noreply.github.com>

Implementing support to Debugger::Break. (dotnet#44305)

Set fgOptimizedFinally flag correctly (dotnet#44268)

- Initialize to 0 at compiler startup
- Set flag when finally cloning optimization kicks in

Fixes non-deterministic generation of nop opcodes into ARM32 code

Forbid `- byref cnst` -> `+ (byref -cnst)` transformation. (dotnet#44266)

* Add a repro test.

* Forbid the transformation for byrefs.

* Update src/coreclr/src/jit/morph.cpp

Co-authored-by: Andy Ayers <andya@microsoft.com>

* Update src/coreclr/src/jit/morph.cpp

* Fix the test return value.

WriteLine is just to make sure we don't delete the value.

* improve the test.

avoid a possible overflow and don't waste time on printing.

Co-authored-by: Andy Ayers <andya@microsoft.com>

Pick libmonosgen-2.0.so from cmake install directory instead of .libs (dotnet#44291)

This aligns Linux with what we already do for all the other platforms.

Update SharedPerformanceCounter assert (dotnet#44333)

Remove silly ToString in GetCLRInstanceString (dotnet#44335)

Use targetPlatformMoniker for net5.0 and newer tfms (dotnet#43965)

* Use targetPlatformMoniker for net5.0 and newer tfms

* disabling analyzer, update version to 0.0, and use new format.

* update the targetFramework.sdk

* removing supportedOS assembly level attribute

* fix linker errors and addressing feedback

* making _TargetFrameworkWithoutPlatform as private

[sgen] Add Ward annotations to sgen_get_total_allocated_bytes (dotnet#43833)

Attempt to fix https://jenkins.mono-project.com/job/test-mono-mainline-staticanalysis/

Co-authored-by: lambdageek <lambdageek@users.noreply.github.com>

[tests] Re-enable tests fixed by dotnet#44081 (dotnet#44212)

Fixes
mono/mono#15030 and
fixes mono/mono#15031 and
fixes mono/mono#15032

Add an implicit argument coercion check. (dotnet#43386)

* Add `impCheckImplicitArgumentCoercion`.

* Fix tests with type mismatch.

* Try to fix VM signature.

* Allow to pass byref as native int.

* another fix.

* Fix another IL test.

[mono] Change CMakelists.txt "python" -> Python3_EXECUTABLE (dotnet#44340)

Debian doesn't install a "python" binary for python3.

Tweak StreamConformanceTests for cancellation (dotnet#44342)

- Avoid unnecessary timers
- Separate tests for precancellation, ReadAsync(byte[], ...) cancellation, and ReadAsync(Memory, ...) cancellation

Use Dictionary for underlying cache of ResourceSet (dotnet#44104)

Simplify catch-rethrow logic in NetworkStream (dotnet#44246)

A follow-up on dotnet#40772 (comment), simplifies and harmonizes the way we wrap exceptions into IOException. Having one catch block working with System.Exception seems to be enough here, no need for specific handling of SocketException.

Simple GT_NEG optimization for dotnet#13837 (dotnet#43921)

* Simple arithmetic optimization with GT_NEG

* Skip GT_NEG optimization when an operand is constant. Revert bitwise rotation pattern

* Fixed Value Numbering assert

* Cleaned up code and comments for simple GT_NEG optimization

* Formatting

Co-authored-by: Julie Lee <jeonlee@microsoft.com>

[master] Update dependencies from mono/linker (dotnet#44322)

* Update dependencies from https://github.com/mono/linker build 20201105.1

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20527.2 -> To Version 6.0.0-alpha.1.20555.1

* Update dependencies from https://github.com/mono/linker build 20201105.2

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20527.2 -> To Version 6.0.0-alpha.1.20555.2

* Disable new optimization for libraries mode (it cannot work in this mode)

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Marek Safar <marek.safar@gmail.com>

Tighten argument validation in StreamConformanceTests (dotnet#44326)

Add threshold on number of files / partition in SPMI collection (dotnet#44180)

* Add check for files count

* Fix the OS check

* decrese file limit to 1500:

* misc fix

* Do not upload to azure if mch files are zero size

Fix ELT profiler tests (dotnet#44285)

[master] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/icu (dotnet#44336)

[master] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/icu

 - Merge branch 'master' into darc-master-2211df94-2a02-4c3c-abe1-e3534e896267

Fix Send_TimeoutResponseContent_Throws (dotnet#44356)

If the client times out too quickly, the server may never have a connection to accept and will hang forever.

Match CoreCLR behaviour on thread start failure (dotnet#44124)

Co-authored-by: Aleksey Kliger (λgeek) <akliger@gmail.com>

Add slash in Windows SoD tool build (dotnet#44359)

* Add slash in Windows SoD tool build

* Update SoD search path to match output dir

* Fixup dotnet version

* Remove merge commit headers

* Disable PRs

Co-authored-by: Drew Scoggins <andrew.g.scoggins@gmail>

Reflect test path changes in .gitattributes; remove nonexistent files (dotnet#44371)

Bootstrapping a test for R2RDump (dotnet#42150)

Improve performance of Enum's generic IsDefined / GetName / GetNames (dotnet#44355)

Eliminates the boxing in IsDefined/GetName/GetValues, and in GetNames avoids having to go through RuntimeType's GetEnumNames override.

clarify http version test (dotnet#44379)

Co-authored-by: Geoffrey Kizer <geoffrek@windows.microsoft.com>

Update dependencies from https://github.com/mono/linker build 20201106.1 (dotnet#44367)

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20555.2 -> To Version 6.0.0-alpha.1.20556.1

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

Disable RunThreadLocalTest8_Values on Mono (dotnet#44357)

* Disable RunThreadLocalTest8_Values on Mono

It's failing on SLES

* fix typo

LongProcessNamesAreSupported: make test work on distros where sleep is a symlink/script (dotnet#44299)

* LongProcessNamesAreSupported: make test work on distros where sleep is a symlink/script

* PR feedback

Co-authored-by: Stephen Toub <stoub@microsoft.com>

* fix compilation

Co-authored-by: Stephen Toub <stoub@microsoft.com>

add missing constructor overloads (dotnet#44380)

Co-authored-by: Geoffrey Kizer <geoffrek@windows.microsoft.com>

change using in ConnectCallback_UseUnixDomainSocket_Success (dotnet#44366)

Clean up the samples (dotnet#44293)

Update dotnet/roslyn issue link

Delete stale comment about dotnet/roslyn#30797

Fix/remove TODO-NULLABLEs (dotnet#44300)

* Fix/remove TODO-NULLABLEs

* remove redundant !

* apply Jozkee's feedback

* address feedback

Update glossary (dotnet#44274)

Co-authored-by: Juan Hoyos <juan.hoyos@microsoft.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>

Add files need for wasm executable relinking/aot to the wasm runtime pack. (dotnet#43785)

Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>

Move some more UnmanagedCallersOnly tests to IL now that they're invalid C# (dotnet#43366)

Fix C++ build for mono/metadata/threads.c (dotnet#44413)

`throw` is a reserved keyword in C++.

Disable a failing test. (dotnet#44404)

Change async void System.Text.Json test to be async Task (dotnet#44418)

Improve crossgen2 comparison jobs (dotnet#44119)

- Fix compilation on unix platforms
  - Wrap use of wildcard in quotes
- Print better display name into log
- Fix X86 constant comparison handling
- Add ability to compile specific overload via single method switches

Remove some unnecessary GetTypeInfo usage (dotnet#44414)

Fix MarshalTypedArrayByte and re-enable it. Re-enable TestFunctionApply
@ghost ghost locked as resolved and limited conversation to collaborators Dec 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Optimize simple arithmetic with GT_NEG
6 participants