Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize Arm64 ldr/str to ldp/stp optimization #81278

Closed
BruceForstall opened this issue Jan 27, 2023 · 4 comments · Fixed by #84135 or #85657
Closed

Generalize Arm64 ldr/str to ldp/stp optimization #81278

BruceForstall opened this issue Jan 27, 2023 · 4 comments · Fixed by #84135 or #85657
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@BruceForstall
Copy link
Member

BruceForstall commented Jan 27, 2023

#77540 introduced a JIT peephole optimization to convert consecutive ldr/str instructions to ldp/stp. It was limited to avoiding this optimization when either of the two ldr/str represented lclvar, which requires more work to properly handle the GC effects.

See #77540 (review) for more discussion.

This issue tracks generalizing the optimization to handle the lclvar cases.

Also, the optimization was restricted to not work in prologs or epilogs, to avoid affecting unwind codes. There are two potential improvements here:

  1. Make sure the unwind codes work with the optimization.
  2. Allow the optimization in the non-OS (not unwindable) part of the prolog/epilog.
@BruceForstall BruceForstall added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jan 27, 2023
@BruceForstall BruceForstall added this to the 8.0.0 milestone Jan 27, 2023
@ghost
Copy link

ghost commented Jan 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

#77540 introduced a JIT peephole optimization to convert consecutive ldr/str instructions to ldp/stp. It was limited to avoiding this optimization when either of the two ldr/str represented lclvar, which requires more work to properly handle the GC effects.

See #77540 (review) for more discussion.

This issue tracks generalizing the optimization to handle the lclvar cases.

Author: BruceForstall
Assignees: -
Labels:

arch-arm64, area-CodeGen-coreclr

Milestone: 8.0.0

@BruceForstall
Copy link
Member Author

Regarding the lclvar case, emitIns_S_S_R_R() might be a place to look.

@kunalspathak kunalspathak self-assigned this Feb 15, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 30, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 31, 2023
@kunalspathak kunalspathak reopened this Apr 3, 2023
@kunalspathak
Copy link
Member

#85032 fixes the general code except prolog/epilog and unwind code. I will leave this issue open.

@kunalspathak
Copy link
Member

Just to experiment, I tried to comment below lines and see lot of methods improved for benchmarks collection.

// Don't remove instructions whilst in prologs or epilogs, as these contain "unwindable"
// parts, where we need to report unwind codes to the OS,
if (emitIGisInProlog(emitCurIG) || emitIGisInEpilog(emitCurIG))
{
return eRO_none;
}

2,387 contexts with diffs (2,387 improvements, 0 regressions, 0 same size)
  -12,172 bytes

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 2, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 3, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
2 participants