-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Creating fmaddsub
for these intrinsics
#89
base: master
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -418,6 +418,7 @@ end | |
const MULADD_INTRINSICS = [ | ||
:fmuladd, | ||
:fma, | ||
|
||
] | ||
|
||
for f in MULADD_INTRINSICS | ||
|
@@ -431,6 +432,23 @@ for f in MULADD_INTRINSICS | |
end | ||
|
||
|
||
for (t, N, T) in [("d" , 2, Float64), ("s" , 4, Float32), | ||
("d.256", 4, Float64), ("s.256", 8, Float32), | ||
# ("d.512", 8, Float64), ("s.512", 16, Float32) # These don't seem supported by LLVM yet | ||
] | ||
@eval @generated function fmaddsub(a::LVec{$N, $T}, b::LVec{$N, $T}, c::LVec{$N, $T}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The function fmaddsub(a,b,c) = SIMD((isodd(n) ? a[n*b[n]-c[n] : a[n]*b[n]+c[n] for n in 1:N)...) Feel free to improve or modify. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re EDIT: I just saw your comment regarding re documentation: sounds great. |
||
ff = "llvm.x86.fma.vfmaddsub.p"*$t | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make the letter |
||
return :( | ||
$(Expr(:meta, :inline)); | ||
ccall($ff, llvmcall, LVec{$($N), $($T)}, (LVec{$($N), $($T)}, LVec{$($N), $($T)}, LVec{$($N), $($T)}), a, b, c) | ||
) | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the end we will also need test cases. |
||
end | ||
|
||
# function fmaddsub(a::LVec{4, Float64}, b::LVec{4, Float64}, c::LVec{4, Float64}) where N | ||
# ccall("llvm.x86.fma.vfmaddsub.pd.256", llvmcall, LVec{4, Float64}, (LVec{4, Float64}, LVec{4, Float64}, LVec{4, Float64}), a, b, c) | ||
# end | ||
|
||
################ | ||
# Load / store # | ||
################ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These intrinsics only exist on
x86
architectures. This function should be wrapped in anif
statement that checks the CPU architecture, and possible also the Julia and LLVM version, to see whether the intrinsic is supported.In case it is not, there should be a reasonably efficient generic fallback. The idea is that one can call
fmaddsub
all the time and expect a reasonable implementation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I genuinely cannot find when these were introduced into LLVM (I've looked at the release notes for pretty much every release since 3.0), however, they start robustly integrating AVX512 into the language at 4.0, so I'm just going to peg it at that and say FMA/AVX instructions are integrated there. I don't really know where to start with Julia versions since different installations of the same Julia version might use different LLVM versions, so I suspect that might be tricky. I'll probably steal your implementation below for the fallback, I don't really see any significant improvements.