-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Creating fmaddsub
for these intrinsics
#89
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #89 +/- ##
==========================================
- Coverage 90.23% 89.70% -0.54%
==========================================
Files 3 3
Lines 502 505 +3
==========================================
Hits 453 453
- Misses 49 52 +3
Continue to review full report at Codecov.
|
src/LLVM_intrinsics.jl
Outdated
# ("d.512", 8, Float64), ("s.512", 16, Float32) # These don't seem supported by LLVM yet | ||
] | ||
@eval @generated function fmaddsub(a::LVec{$N, $T}, b::LVec{$N, $T}, c::LVec{$N, $T}) | ||
ff = "llvm.x86.fma.vfmaddsub.p"*$t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the letter p
part of the suffix t
.
src/LLVM_intrinsics.jl
Outdated
function fmaddsub(a::LVec{N, T}, b::LVec{N, T}, c::LVec{N, T}) where {N, T<:FloatingTypes} | ||
Base.llvmcall("llvm.x86.fma.fmaddsub_pd", LVec{N, T}, (LVec{N, T}, LVec{N, T}, LVec{N, T}), a, b, c) | ||
|
||
for (t, N, T) in [("d" , 2, Float64), ("s" , 4, Float32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These intrinsics only exist on x86
architectures. This function should be wrapped in an if
statement that checks the CPU architecture, and possible also the Julia and LLVM version, to see whether the intrinsic is supported.
In case it is not, there should be a reasonably efficient generic fallback. The idea is that one can call fmaddsub
all the time and expect a reasonable implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I genuinely cannot find when these were introduced into LLVM (I've looked at the release notes for pretty much every release since 3.0), however, they start robustly integrating AVX512 into the language at 4.0, so I'm just going to peg it at that and say FMA/AVX instructions are integrated there. I don't really know where to start with Julia versions since different installations of the same Julia version might use different LLVM versions, so I suspect that might be tricky. I'll probably steal your implementation below for the fallback, I don't really see any significant improvements.
src/LLVM_intrinsics.jl
Outdated
("d.256", 4, Float64), ("s.256", 8, Float32), | ||
# ("d.512", 8, Float64), ("s.512", 16, Float32) # These don't seem supported by LLVM yet | ||
] | ||
@eval @generated function fmaddsub(a::LVec{$N, $T}, b::LVec{$N, $T}, c::LVec{$N, $T}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmaddsub
has a simpler cousin, faddsub
. Do you want to add support for it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function vfmaddsub
is sufficiently obscure that it needs documentation. Documentation in the sense of equivalent Julia code would be good; maybe something along the lines of
fmaddsub(a,b,c) = SIMD((isodd(n) ? a[n*b[n]-c[n] : a[n]*b[n]+c[n] for n in 1:N)...)
Feel free to improve or modify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re faddsub
: I'd be happy to add it, though I have no need for it. That being said, I don't see any documentation or commits for it anywhere, and it's definitely not in the llvm.fma
namespace (at least, when I try to do llvm.fma.faddsub
or llvm.fma.vaddsub
, it doesn't recognize the call, which makes sense as it's not an fma
call), so I'm not sure about whether it's in the scope of this PR.
EDIT: I just saw your comment regarding llvm.sse3.addsub
, so I now see that, but my question still remains on whether it's in the scope of this PR
re documentation: sounds great.
$(Expr(:meta, :inline)); | ||
ccall($ff, llvmcall, LVec{$($N), $($T)}, (LVec{$($N), $($T)}, LVec{$($N), $($T)}, LVec{$($N), $($T)}), a, b, c) | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end we will also need test cases.
These are very different from all other intrinsics we support (which are based on https://llvm.org/docs/LangRef.html). These are generic in terms of the size of the Regarding target-specific intrinsic, there are a huge number of them (https://software.intel.com/sites/landingpage/IntrinsicsGuide/) and they are only available on certain architectures. It isn't clear to me how exactly this should be exposed by SIMD.jl. For example, what happens if someone calls this new function on non-x86? Or when the size of the vector is not exactly such that there is an intrinsic for it? I believe this requires quite a lot of careful planning and thought before just implementing it in exactly the same way as the previous intrinsics. For example, it should probably live in its own file and have some consistent naming based on how it would be called in e.g. C. |
Yes, I agree. I'll repeat what I said in the comment on the issue, but this seems to be stepping into territory covered by VectorizationBase.jl (as that package already detects compatibilities with your system). I'm aware that there's a huge number of other intrinsics, but it's unclear which ones are in LLVM and which ones aren't (and where they might be).
I'm not exactly sure the answer to the first question, but I would hope that multiple dispatch would solve the second issue. I know @eschnett was saying only to define these functions for x86 and (theoretically), similar functions could be defined for NEON intrinsics. Then, there could be default implementations. But this brings back the question of scope. I'm happy to work on this and build out more functionality, but (as I said previously) I'm not sure what the scope is for this. If you decide that this doesn't belong here, that sounds fine by me, I just would recommend putting a disclaimer in the description saying that you're only supporting the intrinsics in the Language Reference (and not necessarily all possible intrinsics for all possible architectures). Either way, I think it's smart for me to hold off until someone tells me what the larger plan is. |
It would be good to define a function |
Ideally, adding support for
fmaddsub
intrinsics. This is still a work in progress. In connection with #88