Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize AdvSimd.Extract() when passed variable that can be const propagated #36070

Closed
kunalspathak opened this issue May 7, 2020 · 9 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:3 Work that is nice to have
Milestone

Comments

@kunalspathak
Copy link
Member

kunalspathak commented May 7, 2020

public byte DoExtract(Vector64<byte> vector) {
  int index = 5;
  AdvSimd.Extract(vector, index);
}

I was expecting it to generate:

G_M20657_IG02:
        3DC007B0          ldr     q16, [fp,#16]
        0E0B3E00          umov    w0, v16.b[5]

But instead we generate the following.

G_M20657_IG02:
        3DC007A0          ldr     q0, [fp,#16]
        528000A0          mov     w0, #5
        97FFF7F5          bl      System.Runtime.Intrinsics.Arm.AdvSimd:Extract(System.Runtime.Intrinsics.Vector128`1[Byte],ubyte):ubyte
        53001C00          uxtb    w0, w0

It happens because we decide whether to fallback or not depending on the index operand. If it is const, we generate the optimize code however this decision happens during importing and we won't know if the operand is constant or not until we do constant propagation which is in later phase.

We should also investigate if there are more scenarios in which we miss optimizing opportunity because of this dependency and evaluate if we should do the decision after constant propagation is done probably in lower.

category:cq
theme:hardware-intrinsics
skill-level:intermediate
cost:medium

@kunalspathak kunalspathak added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 7, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 7, 2020
@kunalspathak
Copy link
Member Author

kunalspathak commented May 7, 2020

//cc : @BruceForstall , @echesakovMSFT , @tannergooding

@BruceForstall BruceForstall added this to the 5.0 milestone May 7, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label May 7, 2020
@BruceForstall
Copy link
Member

cc @CarolEidt

@echesakov
Copy link
Contributor

This is a case for any intrinsic that has an immediate operand - not only for Extract
I wonder if it's possible to move an intrinsic call transofrmation from importer to a later phase.

@EgorBo
Copy link
Member

EgorBo commented May 8, 2020

Doesn't Roslyn propagate constants for such simple cases (without inlining) ?

@tannergooding
Copy link
Member

This is basically a dupe of #11062 and possibly a couple other issues iirc. It is a problem on both x86/x64 and ARM64.

Ideally we would delay the decision for this to be a call or constant until lowering but that isn't necessarily "easy" to do today.
We already have minimal support for rewriting intrinsics to calls in Rationalization (which happens just before lowering) and are using that today for GT_INTRINSIC.

We could presumably extend that to GenTreeJitIntrinsic as well, but it would require a few changes including making it a LARGE_NODE.
I left some comments from my initial investigation here: #11062 (comment) including some of the issues I ran into with the RwriteIntrinsicAsUserCall method.

@tannergooding
Copy link
Member

There is also another case that might be related where we miss some optimizations on x86 due to GT_CAST nodes: #35857 (comment)

Basically the tree might look like:

               [000038] -----+------                 +--*  CAST      int <- ubyte <- int
               [000000] -----+------                 |  \--*  LCL_VAR   int    V01 arg0

and certain instructions might be able to contain or consume the underlying value directly, but we aren't smart enough to contain and handle the cast today.

@BruceForstall
Copy link
Member

Since it appears unlikely we will work on this for 5.0, I've moved it out to 6.0.

cc @AndyAyersMS who might be interested in the phase ordering aspect of this.

@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@echesakov echesakov added Priority:3 Work that is nice to have and removed needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Jun 3, 2021
@echesakov echesakov modified the milestones: 6.0.0, Future Jun 3, 2021
@echesakov echesakov removed their assignment Mar 15, 2022
@echesakov
Copy link
Contributor

Un-assigning myself
cc @BruceForstall

@BruceForstall
Copy link
Member

Closing as a dup of #11062

@ghost ghost locked as resolved and limited conversation to collaborators Apr 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:3 Work that is nice to have
Projects
None yet
Development

No branches or pull requests

7 participants