Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64: Avoid LEA for volatile IND #64354

Merged
merged 6 commits into from
Jan 27, 2022
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jan 26, 2022

Follow up to #62895
e.g.:

volatile int f1;
volatile int f2;

void Test()
{
    f1 = 1;
    f2 = 1;
}

Codgen diff:

; Method Runtime:Test():this
G_M53852_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M53852_IG02:              ;; offset=0008H
-       52800021          mov     w1, #1
-       D5033BBF          dmb     ish
-       B9000801          str     w1, [x0,#8]
-       D5033BBF          dmb     ish
-       B9000C01          str     w1, [x0,#12]
-						;; bbWeight=1    PerfScore 22.50
+       91002001          add     x1, x0, #8
+       52800022          mov     w2, #1
+       889FFC22          stlr    w2, [x1]
+       91003000          add     x0, x0, #12
+       889FFC02          stlr    w2, [x0]
+						;; bbWeight=1    PerfScore 3.50

G_M53852_IG03:              ;; offset=001CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 36

Same byte size, but better PerfScore (no memory barrier)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 26, 2022
@ghost ghost assigned EgorBo Jan 26, 2022
@ghost
Copy link

ghost commented Jan 26, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Follow up to #62895
e.g.:

volatile int f1;
volatile int f2;

void Test()
{
    f1 = 1;
    f2 = 1;
}

Codgen diff:

; Method Runtime:Test():this
G_M53852_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M53852_IG02:              ;; offset=0008H
-       52800021          mov     w1, #1
-       D5033BBF          dmb     ish
-       B9000801          str     w1, [x0,#8]
-       D5033BBF          dmb     ish
-       B9000C01          str     w1, [x0,#12]
-						;; bbWeight=1    PerfScore 22.50
+       91002001          add     x1, x0, #8
+       52800022          mov     w2, #1
+       889FFC22          stlr    w2, [x1]
+       91003000          add     x0, x0, #12
+       889FFC02          stlr    w2, [x0]
+						;; bbWeight=1    PerfScore 3.50

G_M53852_IG03:              ;; offset=001CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 36

Same byte size, but better PerfScore (no memory barrier)

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@VSadov
Copy link
Member

VSadov commented Jan 27, 2022

resulting assembly looks correct and better than before

@EgorBo
Copy link
Member Author

EgorBo commented Jan 27, 2022

@dotnet/jit-contrib PTAL

There are some size regressions (https://dev.azure.com/dnceng/public/_build/results?buildId=1575482&view=ms.vss-build-web.run-extensions-tab) due to additional instructions to prepare addresses instead of addressing modes, but PerfScore diffs are much better, e.g. libraries.pmi:

Summary of Perf Score diffs:
(Lower is better)

Total PerfScoreUnits of base: 106196855193729.92
Total PerfScoreUnits of diff: 106196773347615.31
Total PerfScoreUnits of delta: -81846114.64 (-0.00 % of base)
Total relative delta: -650.56

NOTE x64/x86 diffs are unrelated (not reproduceable locally) - spmi job picked a wrong baseline jit for them it seems

@kunalspathak
Copy link
Member

NOTE x64/x86 diffs are unrelated (not reproduceable locally) - spmi job picked a wrong baseline jit for them it seems

It picks the same baseline for x64/x86/arm/arm64...so all the diffs are related to that baseline. The question is why it picked wrong baseline, perhaps we didn't catch up or something in jitrollingbuild? @BruceForstall - any idea?

Current hash: 33daf4a951aeed31c8a6128d1822e3a4d24e72b7

Invoking: git merge-base 33daf4a951aeed31c8a6128d1822e3a4d24e72b7 origin/main

Baseline hash: ac0a35b063c7767618468636b782b19a632393b2

Invoking: git log --pretty=format:%H ac0a35b063c7767618468636b782b19a632393b2 -20 -- src/coreclr/jit/*

try 1: 7fcd1f6e45d937b9835c6cf2b0a2f76e0c180e87

try 2: cbc4cd873ed1f2362265e702dc7dfeaa76c5ec16

Warning: the baseline found is not built with the first git hash with JIT code changes; there may be extraneous diffs

I see that arm asmdiffs timesout which we are seeing more recently.

https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-64354-merge-09dda46f624849feb7/unix-arm/1/console.9af3ba57.log?sv=2019-07-07&se=2022-02-16T09%3A14%3A36Z&sr=c&sp=rl&sig=AT7FG1c8tKPdDsxHJBjql8EFj1a2cu6kox%2FzpwXp0Ws%3D

I would probably rebase and double check with official baseline to make sure there is nothing suspicious.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 27, 2022

Yeah I noticed it in my other PR today, some superpmi job silently fails and then displays some previous diffs or something like that.

Rebased.

@kunalspathak
Copy link
Member

Also that might explain the huge diffs followed by timeouts that many of us are seeing.

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
@echesakov
Copy link
Contributor

echesakov commented Jan 27, 2022

It seems that the snippet

add     x1, x0, #8
mov     w2, #1
stlr    w2, [x1]

can be optimized even further on targets supporting ARMv8.4

mov     w2, #1
stlur    w2, [x0, #8]

@EgorBo As a general thought, I wonder if in

void Test()
{
    f1 = 1;
    f2 = 1;
}

the first write needs to be store-release?

Shouldn't the following be sufficient?

Store(&f1, 1);
StoreRelease(&f2, 1);

@EgorBo
Copy link
Member Author

EgorBo commented Jan 27, 2022

can be optimized even further on targets supporting ARMv8.4

Good idea! Will file an issue, not sure even my M1 supports it 😄

@EgorBo As a general thought, I wonder if in

I am not sure why I used two variables in that snippet, the issue reproduces even with a single.
Overall you're right, we can relax some barriers in theory for combinations of volatile operations, but I'd prefer not doing that as part of this PR

@EgorBo
Copy link
Member Author

EgorBo commented Jan 27, 2022

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@EgorBo EgorBo merged commit 5409d71 into dotnet:main Jan 27, 2022
@BruceForstall
Copy link
Member

It picks the same baseline for x64/x86/arm/arm64...so all the diffs are related to that baseline. The question is why it picked wrong baseline, perhaps we didn't catch up or something in jitrollingbuild? @BruceForstall - any idea?

The jitrollingbuild pipeline is set to build every time the JIT changes, and not batch changes:

trigger:
  batch: false
  branches:
    include:
    - main
  paths:
    include:
    - src/coreclr/jit/*
    - src/coreclr/inc/jiteeversionguid.h

However, it appears that AzDO doesn't kick off the build immediately; I noticed a delay of about 10 minutes for this particular change. In a few previous cases for JIT changes, there were also delays before the AzDO job kicked off, and in those cases, another change was merged before the kickoff, and AzDO built that change, not the JIT change that kicked off the job. Because of this, when we upload the JIT to Azure Storage, we use a git hash of a non-JIT change, and when superpmi.py looks for a baseline JIT, it doesn't find the hash of the JIT that should have been built.

Summary: there is an assumption that the pipeline builds every JIT change at the git hash of the JIT change. That doesn't appear to be the case with AzDO. I'm not sure if there's a config to force that to happen. E.g., what if 2 JIT changes were merged at almost exactly the same time? AzDO should build both of them, but does it?

Possibly we could change the jitrollingbuild.py upload script to walk back the git log for the built change and use the first hash that actually has JIT changes in it, to work around this problem.

@BruceForstall
Copy link
Member

Added #64392

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants