-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for folding SIMD intrinsics #81547
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis is a minimal proof of concept that adds SIMD folding support for:
The first three are done as a general proof of concept for The latter is an example of a case that can't really use templating due to it being The intent is not that we ever add SIMD folding for "everything". My goal would be to add SIMD folding for the This helps keep SIMD constant folding support generally "scoped" and to things which are known to be generally supported/commonplace. For cases like
|
Additional tests still need to be added. We already have a few due to the HWIntrinsic tests being templated, but more scenarios should be added. |
Initial diff is great. No measurable TP change and positive diffs for tests/benchmarks. Linux Arm64
Linux x64
Windows Arm64/x64 are about half this (largely due to ABI differences from what I can tell), but still all positive. |
d02c412
to
530de00
Compare
84d0cda
to
368384b
Compare
So we start off with this tree
That gets recognized and constant folded
We don't see any transforms to the tree here (just tracking we then do some work in
Then in assertion prop we finally do the replacement, but notably don't yet get rid of the now unused CSE
When processing the comma in
|
It seems like there are a few problems here:
If someone from @dotnet/jit-contrib has some time I'd like to better understand the bits here and what we can do to resolve them. I expect 1/2 are external to this PR, but 3 is something that needs resolving for this to go in. Is the "simple fix" just having |
78fae54
to
44d7f81
Compare
CC. @dotnet/jit-contrib |
44d7f81
to
0da233a
Compare
src/coreclr/jit/gentree.h
Outdated
|
||
#if defined(TARGET_XARCH) | ||
// scalar operations on xarch copy the upper bits from arg0 | ||
*result = arg0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain me please this part on an example? (the difference between xarch and arm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xarch
has the behavior where scalar operations "copy the upper bits", that is x + y
is equivalent to:
Vector128<T> result = x;
return result.WithElement(0, x.GetElement(0) + y.GetElement(0));
arm
on the other hand zeros the upper bits, that is x + y
is equivalent to:
Vector128<T> result = Vector128<T>.Zero;
return result.WithElement(0, x.GetElement(0) + y.GetElement(0));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a path that explicitly zeros for Arm64 to help clarify the logic
feb4403
to
fddea65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just out of curiosity - will e.g. adding folding for vector comparison be only a matter of additing a case to EvaluateBinaryScalar
and that's it?
{ | ||
switch (baseType) | ||
{ | ||
case TYP_FLOAT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably these could be hidden under a macro like EVAL_UNARY_SIMD(TYP_FLOAT, float)
but some people hate macros so it's fine as is.
For the most part yes. There are a few floating-point edge cases that will end up needing specialization so they handle NaN correctly. |
This is a minimal proof of concept that adds SIMD folding support for:
The first three are done as a general proof of concept for
simd = op(simd)
andsimd = op(simd, simd)
. They attempt to use templating to reduce code duplication and otherwise make things simple to add/test.The latter is an example of a case that can't really use templating due to it being
scalar = op(simd, int)
. However, it is one that has a decent amount of actual light up for code that is using scalar fallbacks.The intent is not that we ever add SIMD folding for "everything". My goal would be to add SIMD folding for the
xplat
API surface, that is whatVector64/128/256/512<T>
expose and provide software fallbacks for.This helps keep SIMD constant folding support generally "scoped" and to things which are known to be generally supported/commonplace.
For cases like
Add/Subtract
, we should ideally also add the simple cases aroundx + 0 == x
andx - 0 == x
. The same would be true forx * 1
,x / 1
, and other similar cases when such SIMD folding is added. That is, basically covering the same scenarios that the scalar binary ops cover (e.g.GT_ADD
,GT_SUB
, etc).