Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Support more scaled addressing modes on arm64 #97921

Merged
merged 5 commits into from
Feb 5, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Feb 3, 2024

We currently support scaled addressing modes when the index also needs an extension through contained BFIZ nodes. However, we did not support scaled addressing modes if the index was 64 bits. This adds that support as a natural extension to the GT_LEA.

static int Foo(int[] p, nint index) => p[index];

Before:

G_M45892_IG02:  ;; offset=0x0008
            ldr     w2, [x0, #0x08]
            cmp     x1, x2
            bhs     G_M45892_IG04
            add     x0, x0, #16
            lsl     x1, x1, #2
            ldr     w0, [x0, x1]

After:

G_M45892_IG02:  ;; offset=0x0008
            ldr     w2, [x0, #0x08]
            cmp     x1, x2
            bhs     G_M45892_IG04
            add     x0, x0, #16
            ldr     w0, [x0, x1, LSL #2]

Some regressions expected from fewer CSEs since we do not CSE address modes.

Prerequisite for #97865

We currently support scaled addressing modes when the index also needs
an extension through contained `BFIZ` nodes. However, we did not support
scaled addressing modes if the index was 64 bits. This adds that
support as a natural extension to the `GT_LEA`.
@ghost ghost assigned jakobbotsch Feb 3, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 3, 2024
@ghost
Copy link

ghost commented Feb 3, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We currently support scaled addressing modes when the index also needs an extension through contained BFIZ nodes. However, we did not support scaled addressing modes if the index was 64 bits. This adds that support as a natural extension to the GT_LEA.

static int Foo(int[] p, nint index) => p[index];

Before:

G_M45892_IG02:  ;; offset=0x0008
            ldr     w2, [x0, #0x08]
            cmp     x1, x2
            bhs     G_M45892_IG04
            add     x0, x0, #16
            lsl     x1, x1, #2
            ldr     w0, [x0, x1]

After:

G_M45892_IG02:  ;; offset=0x0008
            ldr     w2, [x0, #0x08]
            cmp     x1, x2
            bhs     G_M45892_IG04
            add     x0, x0, #16
            ldr     w0, [x0, x1, LSL #2]

Prerequisite for #97865

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved
@ryujit-bot
Copy link

Diff results for #97921

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,505,882 contexts (1,007,092 MinOpts, 1,498,790 FullOpts).

MISSED contexts: base: 1,433 (0.06%), diff: 1,436 (0.06%)

Overall (-250,480 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,545,232 -3,284
benchmarks.run_pgo.linux.arm64.checked.mch 79,735,896 -19,856
benchmarks.run_tiered.linux.arm64.checked.mch 24,597,876 -6,084
coreclr_tests.run.linux.arm64.checked.mch 508,527,192 -29,700
libraries.crossgen2.linux.arm64.checked.mch 55,834,472 -11,212
libraries.pmi.linux.arm64.checked.mch 76,274,408 -8,672
libraries_tests.run.linux.arm64.Release.mch 394,024,440 -150,612
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 164,978,292 -15,980
realworld.run.linux.arm64.checked.mch 15,900,008 -3,380
smoke_tests.nativeaot.linux.arm64.checked.mch 2,829,664 -1,700
MinOpts (-22,888 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 25,934,784 -5,292
benchmarks.run_tiered.linux.arm64.checked.mch 19,738,436 -3,656
coreclr_tests.run.linux.arm64.checked.mch 348,148,152 -2,712
libraries_tests.run.linux.arm64.Release.mch 215,131,952 -10,636
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 13,497,464 -592
FullOpts (-227,592 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,240,304 -3,284
benchmarks.run_pgo.linux.arm64.checked.mch 53,801,112 -14,564
benchmarks.run_tiered.linux.arm64.checked.mch 4,859,440 -2,428
coreclr_tests.run.linux.arm64.checked.mch 160,379,040 -26,988
libraries.crossgen2.linux.arm64.checked.mch 55,832,836 -11,212
libraries.pmi.linux.arm64.checked.mch 76,154,424 -8,672
libraries_tests.run.linux.arm64.Release.mch 178,892,488 -139,976
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,480,828 -15,388
realworld.run.linux.arm64.checked.mch 15,319,084 -3,380
smoke_tests.nativeaot.linux.arm64.checked.mch 2,828,676 -1,700

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,270,095 contexts (932,669 MinOpts, 1,337,426 FullOpts).

MISSED contexts: base: 772 (0.03%), diff: 775 (0.03%)

Overall (-177,740 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,173,700 -3,300
benchmarks.run_pgo.osx.arm64.checked.mch 34,346,112 -16,604
benchmarks.run_tiered.osx.arm64.checked.mch 15,512,636 -5,860
coreclr_tests.run.osx.arm64.checked.mch 486,273,168 -26,920
libraries.crossgen2.osx.arm64.checked.mch 55,715,976 -11,204
libraries.pmi.osx.arm64.checked.mch 80,194,908 -4,428
libraries_tests.run.osx.arm64.Release.mch 323,385,308 -90,324
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 162,548,024 -15,812
realworld.run.osx.arm64.checked.mch 15,058,824 -3,288
MinOpts (-21,812 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 16,301,300 -4,868
benchmarks.run_tiered.osx.arm64.checked.mch 11,504,484 -3,636
coreclr_tests.run.osx.arm64.checked.mch 332,612,924 -2,076
libraries_tests.run.osx.arm64.Release.mch 203,715,848 -10,640
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 13,153,728 -592
FullOpts (-155,928 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,173,164 -3,300
benchmarks.run_pgo.osx.arm64.checked.mch 18,044,812 -11,736
benchmarks.run_tiered.osx.arm64.checked.mch 4,008,152 -2,224
coreclr_tests.run.osx.arm64.checked.mch 153,660,244 -24,844
libraries.crossgen2.osx.arm64.checked.mch 55,714,348 -11,204
libraries.pmi.osx.arm64.checked.mch 80,073,780 -4,428
libraries_tests.run.osx.arm64.Release.mch 119,669,460 -79,684
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 149,394,296 -15,220
realworld.run.osx.arm64.checked.mch 14,494,868 -3,288

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,339,805 contexts (938,449 MinOpts, 1,401,356 FullOpts).

MISSED contexts: base: 1,309 (0.06%), diff: 1,312 (0.06%)

Overall (-181,504 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,957,488 -3,280
benchmarks.run_pgo.windows.arm64.checked.mch 45,442,912 -20,196
benchmarks.run_tiered.windows.arm64.checked.mch 15,586,420 -6,072
coreclr_tests.run.windows.arm64.checked.mch 495,084,120 -29,016
libraries.crossgen2.windows.arm64.checked.mch 59,059,712 -11,856
libraries.pmi.windows.arm64.checked.mch 79,827,256 -12,692
libraries_tests.run.windows.arm64.Release.mch 329,452,088 -76,184
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,557,220 -16,260
realworld.run.windows.arm64.checked.mch 15,902,524 -3,372
smoke_tests.nativeaot.windows.arm64.checked.mch 3,798,548 -2,576
MinOpts (-21,684 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 16,011,396 -4,920
benchmarks.run_tiered.windows.arm64.checked.mch 11,177,188 -3,600
coreclr_tests.run.windows.arm64.checked.mch 338,689,104 -2,184
libraries_tests.run.windows.arm64.Release.mch 203,433,596 -10,388
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 13,153,748 -592
FullOpts (-159,820 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,956,952 -3,280
benchmarks.run_pgo.windows.arm64.checked.mch 29,431,516 -15,276
benchmarks.run_tiered.windows.arm64.checked.mch 4,409,232 -2,472
coreclr_tests.run.windows.arm64.checked.mch 156,395,016 -26,832
libraries.crossgen2.windows.arm64.checked.mch 59,058,076 -11,856
libraries.pmi.windows.arm64.checked.mch 79,707,272 -12,692
libraries_tests.run.windows.arm64.Release.mch 126,018,492 -65,796
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,403,472 -15,668
realworld.run.windows.arm64.checked.mch 15,338,544 -3,372
smoke_tests.nativeaot.windows.arm64.checked.mch 3,797,536 -2,576

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.09% to -0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.02%
benchmarks.run_pgo.linux.arm64.checked.mch -0.03%
benchmarks.run_tiered.linux.arm64.checked.mch -0.03%
coreclr_tests.run.linux.arm64.checked.mch -0.02%
libraries.crossgen2.linux.arm64.checked.mch -0.02%
libraries.pmi.linux.arm64.checked.mch -0.01%
libraries_tests.run.linux.arm64.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%
realworld.run.linux.arm64.checked.mch -0.02%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.06%
MinOpts (-0.03% to +0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.02%
benchmarks.run_pgo.linux.arm64.checked.mch -0.02%
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%
libraries.pmi.linux.arm64.checked.mch -0.03%
libraries_tests.run.linux.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%
realworld.run.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.01%
FullOpts (-0.12% to -0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.02%
benchmarks.run_pgo.linux.arm64.checked.mch -0.03%
benchmarks.run_tiered.linux.arm64.checked.mch -0.04%
coreclr_tests.run.linux.arm64.checked.mch -0.03%
libraries.crossgen2.linux.arm64.checked.mch -0.02%
libraries.pmi.linux.arm64.checked.mch -0.01%
libraries_tests.run.linux.arm64.Release.mch -0.12%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%
realworld.run.linux.arm64.checked.mch -0.02%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.06%

Throughput diffs for linux/x64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.x64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.07% to -0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.03%
benchmarks.run_pgo.osx.arm64.checked.mch -0.05%
benchmarks.run_tiered.osx.arm64.checked.mch -0.04%
coreclr_tests.run.osx.arm64.checked.mch -0.02%
libraries.crossgen2.osx.arm64.checked.mch -0.02%
libraries.pmi.osx.arm64.checked.mch -0.01%
libraries_tests.run.osx.arm64.Release.mch -0.07%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.01%
realworld.run.osx.arm64.checked.mch -0.02%
MinOpts (-0.03% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.03%
benchmarks.run_tiered.osx.arm64.checked.mch -0.03%
libraries.crossgen2.osx.arm64.checked.mch -0.01%
libraries.pmi.osx.arm64.checked.mch -0.03%
libraries_tests.run.osx.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.01%
realworld.run.osx.arm64.checked.mch +0.01%
FullOpts (-0.10% to -0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.03%
benchmarks.run_pgo.osx.arm64.checked.mch -0.05%
benchmarks.run_tiered.osx.arm64.checked.mch -0.04%
coreclr_tests.run.osx.arm64.checked.mch -0.03%
libraries.crossgen2.osx.arm64.checked.mch -0.02%
libraries.pmi.osx.arm64.checked.mch -0.01%
libraries_tests.run.osx.arm64.Release.mch -0.10%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.01%
realworld.run.osx.arm64.checked.mch -0.02%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.07% to -0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.03%
benchmarks.run_pgo.windows.arm64.checked.mch -0.04%
benchmarks.run_tiered.windows.arm64.checked.mch -0.04%
coreclr_tests.run.windows.arm64.checked.mch -0.02%
libraries.crossgen2.windows.arm64.checked.mch -0.02%
libraries.pmi.windows.arm64.checked.mch -0.02%
libraries_tests.run.windows.arm64.Release.mch -0.05%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch -0.02%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.07%
MinOpts (-0.03% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.windows.arm64.checked.mch -0.03%
benchmarks.run_tiered.windows.arm64.checked.mch -0.03%
libraries.pmi.windows.arm64.checked.mch -0.03%
libraries_tests.run.windows.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch +0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.01%
FullOpts (-0.07% to -0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.03%
benchmarks.run_pgo.windows.arm64.checked.mch -0.04%
benchmarks.run_tiered.windows.arm64.checked.mch -0.04%
coreclr_tests.run.windows.arm64.checked.mch -0.03%
libraries.crossgen2.windows.arm64.checked.mch -0.02%
libraries.pmi.windows.arm64.checked.mch -0.02%
libraries_tests.run.windows.arm64.Release.mch -0.07%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch -0.02%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.07%

Throughput diffs for windows/x64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x64.checked.mch +0.01%

Details here


Throughput diffs for linux/arm ran on windows/x86

MinOpts (-0.02% to -0.00%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch -0.01%
benchmarks.run_pgo.linux.arm.checked.mch -0.01%
benchmarks.run_tiered.linux.arm.checked.mch -0.01%
libraries.pmi.linux.arm.checked.mch -0.02%
libraries_tests.run.linux.arm.Release.mch -0.01%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.00% to +0.01%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
libraries.pmi.windows.x86.checked.mch +0.01%
libraries_tests.run.windows.x86.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%
MinOpts (+0.00% to +0.02%)
Collection PDIFF
benchmarks.run_pgo.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
libraries.pmi.windows.x86.checked.mch +0.02%
libraries_tests.run.windows.x86.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.01%
FullOpts (+0.00% to +0.01%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
libraries.pmi.windows.x86.checked.mch +0.01%
libraries_tests.run.windows.x86.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97921

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.07% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
libraries.pmi.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.04%
libraries_tests.run.linux.arm64.Release.mch -0.07%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%
MinOpts (+0.01% to +0.08%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch +0.01%
benchmarks.run_pgo.linux.arm64.checked.mch +0.01%
libraries.pmi.linux.arm64.checked.mch +0.08%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.02%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%
libraries.crossgen2.linux.arm64.checked.mch +0.01%
benchmarks.run.linux.arm64.checked.mch +0.05%
FullOpts (-0.10% to +0.01%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
libraries.pmi.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.04%
libraries_tests.run.linux.arm64.Release.mch -0.10%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.01%
coreclr_tests.run.linux.arm64.checked.mch -0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for linux/x64 ran on linux/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.x64.checked.mch +0.01%

Details here


@jakobbotsch jakobbotsch marked this pull request as ready for review February 5, 2024 08:00
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @EgorBo

Diffs. Regressions are because we don't CSE address modes.

@jakobbotsch jakobbotsch requested a review from EgorBo February 5, 2024 08:02
Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't remember why I gave up on these.
You may want to run fuzzlyn - it found quite a few addrmode related issues for me in the past - but leaving that up to you

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

The jitstress/libraries-jitstress failures look like #97892 although it's hard to know if anything is hidden in there...

@jakobbotsch jakobbotsch merged commit 1ed67a5 into dotnet:main Feb 5, 2024
181 of 196 checks passed
@jakobbotsch jakobbotsch deleted the arm64-scaled-ams branch February 5, 2024 15:28
@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants